dupeguru is my go-to. Can find duplicate files by binary comparison, "fuzzy" search for similar looking photos, etc etc. Can delete/move or replace duplicate files with symlinks or hardlinks. It's cross-platform as well.
It can't search for images that appear to be a screenshot, however. I'm not aware of any such functionality in a standalone program, actually (google photos is what immediately comes to mind) but would be curious to see if one actually exists.
if they have exif data intact you can use exiftool to organize them and flag/move duplicates , otherwise you might need something that uses phash to identify duplicates
if you don’t want to use terminal tools, you can use something like dupeguru or there are a variety of other apps on the app store which also can do duplicates identification
https://dupeguru.voltaicideas.net/
"dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same."
It needs to be run from another host and I'd recommend to have a fast connection to the NAS.
I've been putting off removing my duplicate movie files for years. Mostly because I have a relatively large library of ~13,000 movies and figured this would be a multi-week project. My main objective was to find duplicates and delete the smaller size copy. I stumbled upon this tool called dupeGuru (Windows, Linux, Mac) that allowed me to accomplish this with about an hours worth of work! In total I removed ~2,500 duplicates that were using 4.67TB of space. I'm beyond happy.
Quick search says this finds duplicate photos: https://dupeguru.voltaicideas.net/
"It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same."
To weed out duplicate photos, I would recommend dupeGuru. If you trust the program enough, deleting all of the duplicates it finds requires only several clicks following the scan. Or you can compare each result visually yourself to make sure there aren't any miscategorizations. The program can additionally search photos based on visual similarity (not just binary comparison) so it's a huge help if you have resized photos scattered in the mix.
As for organizing/folder structure, you might be able to pull it off with just windows explorer:
I didn't know how familiar you were with regards to using windows 10/computers in general so I glossed over many aspects in my description above, but just let me know if you need clarification on anything!
There's a small program called Dupeguru, I've used it to find duplicate pics & music.
https://dupeguru.voltaicideas.net/
Attribute changer (also great) can change all the additional bits, but you have to do it 1 by 1.
Years ago, I used an app called DupeGuru to go through folders of digital images. It did a very good job of identifying not only pure duplicates, but the same image in a different format or resolution. That was the "Picture Edition"; they had other versions for different file types.
The app apparently is no longer supported by its original developer but has transitioned to an Andrew Senetar and the app home page is here . It doesn't look like the app has been updated in a couple of years, but on a current Mac it should run fine (might be 32-bit so what it does under 10.15 is unknown to me). The bonus is that now the app appears to be free, so you can try it yourself to see if it will do what you want it to do. I just downloaded the current version and will try it out myself shortly. No relation to the developers other than as a satisfied customer (I paid $$ for this back in the day). Good luck.
You can try Dupeguru
You can set certain folders to be “reference” so it will never mark any duplicate it finds there for deletion. Another feature I like is that once it has find the duplicates you can instruct it to move them somewhere else (instead of deleting them) while preserving the original folder structure.
>So is there a program to help scan the 10tb drive and all the old drives to make sure no duplicates will be transferred no matter how many folders deep the pictures/documents are? And if there is already duplicates on the 10tb, it will show me..?
You can use dupeGuru and Starwinds deduplication analyzer to analyze your data, find duplicates, estimate your storage savings upon removal and remove duplicates.
https://dupeguru.voltaicideas.net/
https://www.starwindsoftware.com/starwind-deduplication-analyzer
Check out dupeguru.
From their website:
"dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same."
Identify the duplicates, estimate your storage savings and remove duplicates using Dedupe analyzer and dedupeGuru. MD5 hashes are used to compute compare contents.
https://www.starwindsoftware.com/starwind-deduplication-analyzer
https://dupeguru.voltaicideas.net/
Still, you should run a backup job prior removing any data.
You can just send them all to the trash if you like with review.
Although you can just click on Edit–>Mark All and then Actions–>Send Marked to Recycle bin to quickly delete all duplicate files in your results, it is always recommended to review all duplicates before deleting them.
https://dupeguru.voltaicideas.net/help/en/results.html#reviewing-results
For the same task using dupeGuru and Starwind deduplication analyzer. They allow to estimate storage savings, find out the data that can be deduplicated and remove it. https://www.starwindsoftware.com/starwind-deduplication-analyzer https://dupeguru.voltaicideas.net/
Before removing anything, back up and test the software with your data.
dupeGuru and Starwind deduplication analyzer tool should help you estimate your storage savings and find out the data that can be deduplicated.
https://dupeguru.voltaicideas.net/
https://www.starwindsoftware.com/starwind-deduplication-analyzer
Dupeguru worked for me. It's on OBS.
Tip - if you set a folder to be a Reference on the Directories tab, then it will keep files in that folder by default and files outside of it will be able to be selected for removal.
try this and then do this:
unless you are willing to make a serious time commitment, I think picking a few out of the lot is more likely to get some results than trying to cut by half or something.
a free open source software tool: https://dupeguru.voltaicideas.net/help/en/index.html
but you still have to look at a looooot of things.
tell your wife that taking 1 photo is special but taking 1000 photos is like taking none. get her a 35mm camera. in 40 years you'll still have the shitty photos you took with it.
I'm into organising Files as much as I can, But mostly enjoy renaming them. Into my preferred style. Especially Music. I use Antrenamer that I find great for bulk renaming. Works on windows & in linux under wine.
https://antp.be/software/renamer
I use dupeguru for searching out duplicate files. https://dupeguru.voltaicideas.net
And i used to use a cataloguing app that I cant remember the name of now, That scanned HD/CD/DVD/USB/SD any media. after that you would be able to search the content of any drive that had been scanned whether it was attached or not...which is useful when you know you have a file but cant remember where it's stored....
If your server is running on Synology NAS DSM then it has a file duplicator finder built in. Otherwise try https://dupeguru.voltaicideas.net/ which is what is suggested in what /u/l1g17 has linked too.
Thanks. I really lucked out on this one. Backups were in good shape after all, but it didn't come to that. Was sort of able to reactivate the volume.
After clicking reactivate and letting it do it's thing overnight, I had one active drive letter and access to the data. Drive Management still said it failed redundancy and I was missing a volume though.
So I used TeraCopy to make yet another just-in-case copy/backup of what File Explorer could see. (Copy was to a networked drive.) I used TeraCopy because it also has a 'verify' option.
Once that was done and, just like the article you linked to says, I checked that I could open many large and small files on the backup and on the original drive, I felt a lot freer to change the drives.
After running chkdsk, etc. on them to be sure they were still fine (again, I had a motherboard/other failure, not a drive failure), I selected the "missing" drive on Disk Manager and broke the mirror, then reformatted that drive and did not put it back into a RAID volume.
I'm now in the process of de-duping (combination of dupeGuru and Auslogics Duplicate File Finder) the files on the remaining drive (moving dupes to the newly formatted drive). Then I'll be adding copies from other sources (including the backup) to that drive and re-de-duping (kind of sounds fun!) until I have a clean library.
Once that's finally done, I'll back it up to an NAS somewhere else in the house, a backup cloud service and sync it to a Onedrive account.
Whew!
This is the tool that I usually use when I want to dedupe something: https://dupeguru.voltaicideas.net/It's free and open-source, and there are ready-made binaries for the most common platforms :)
Thanks for helping out!
If they're named by proper MAME standards, you should be able to use something like dupeGuru or something similar to "match" files in both folders and copy the matches from your new set to another folder.
Unfortunately, there's no way I'm aware of within Launchbox to sort external sets.
You could use Easy Duplicate Finder. It does exactly what you're looking for: https://www.easyduplicatefinder.com/ Like many soft out there you can choose the type of file you want to look for. Its a pretty decent soft that's quite fast, although this will greatly depends on your specs. The only downside is that you will be limited unless you pay for a licence.
However, there is a free and open source alternative called dupeGuru and it's right there 👉 https://dupeguru.voltaicideas.net/
I use dupeGuru.
> dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents.
> dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same.
When you add music files into an iTunes library, iTunes copies the music file in the iTunes/iTunes Media/Music folder. If a file with the same name already exists there, it will add a number to it. I assume the Music app in Mac OS Catalina works similarly, but I've never used it so I don't know.
If it's adding duplicate songs by mistake, even when they're identical, use the Show Duplicate Items function to delete duplicates, and select "move to trash" when it asks.
Don't worry about the numbers in the music file names. If you rename those files, iTunes will no longer recognize them, so you have to delete those from your library too and re-add them.
If you feel you have too many duplicate files in your computer (not in iTunes/Music app), you can find them with the app dupeGuru.
dupeGuru for finding "true" copies of any file type. By true copy, I mean that the file is actually bit-for-bit equal. This won't detect "duplicates" that have slight changes to them, like basically same PDF but one has a few additional pages or same video but different resolutions, etc.
I use Duplicate Video Search for finding "duplicates" of videos that have small changes between them but are essentially the same. As with all fuzzy search, you'll have to play with the search settings a bit to get useful results.
I know that programs exist for finding "similar" pictures. Not sure about finding "similar" PDF. I've never seen any tool that can do fuzzy search for "any" file type. Fuzzy search requires the program to understand the contents of the file, beyond just comparing bits.
I had my entire photo library merged from a couple of laptops and my phone resulting in tons of dupes. I settled on using dupe guru in a docker container. It did a great job.
I've found dupeguru to be very fast and easy to use, even over the network. It's cross platform too. https://dupeguru.voltaicideas.net/
Thanks for sharing what you built, glad it's working for you.
I used an application called DupeGuru to do this. Worked quite well. The app does not appear to be under active development these days but it is available and freeware now, so it's worth a shot or at least a download and a short test to see if it will do for your dad what he wants.
Sidebar: I hope that (despite likely numerous duplicates) he has a backup of these files.
like englishman pointed out, qumagie automates groupings of images into defined categories.
it's not 100% accuracy, but for automated groupings it's decent enough.
i also use smart albums in photostation. I manually place pictures into folders myself, then smart albums creates a virtual album from multiple folder locations to make a single smart album, which i can then name e.g. anime.
and if you get confused whether you got duplicate pics, then i recommend using dupeguru to weed out duplicate images so you can delete extra repetitions that are not needed, but are using up your storage space needlessly https://dupeguru.voltaicideas.net/
Invest half a year, a year or two years in learning programming and make a application that will fit your current needs. In the meantime grow a beard and switch to one of them UNIXES clone... This is a solution. It may be bad, but hey... it is still a solution! Also this may be helpful. Cheers.
Haven't used it but DupeGuru https://dupeguru.voltaicideas.net/
"dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It’s written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code......dupeGuru is good with music. It has a special Music mode that can scan tags and shows music-specific information in the duplicate results window. "
https://dupeguru.voltaicideas.net/
I used this one regularly when I was downloading a ton of wallpapers from various subreddits. it's very good at finding duplicates. Or even just images that may be similar. Enjoy!
Edit: you'll have copy the image you're looking for in to its own folder and then make that your reference file/folder.
A couple years ago on Windows I needed to clean out duplicates and DupeGuru is what I ended up using. I was happy with it.
https://dupeguru.voltaicideas.net/
I don't remember if it was cross-platform then, but it seems to be now.
Sorry for your loss. I had similar usecase where i was dumping data from multiple source and there are 1000's of duplicates.
Here are the options you have -
The DSM storage Analyzer report - this is very good interms of finding duplicates, however, not good for cleaning up. meaning you cant take bulk actions and other limitations
3rd party products - the one i use is https://dupeguru.voltaicideas.net/ issue with these products is that they dont run natively on the synology and you would run this on a machine in teh network. it takes a some of the bandwidth. The dupeguru is very good. read the docs on how to use it.
custom scripts - if you are a techie (know python and how to modify it) - there are bunch of custom script options. Here is one I wrote for myself. https://www.reddit.com/r/synology/comments/f09w5w/duplicate_file_finder_python_script_with_great/ It runs locally on the synology and I was able to eleminate most of duplicates. I had trouble with images that were scaled down when being copied to diff devices or downloaded from google photos. For this one, i narrowed down the directories where the potential duplicates are and then used dupeguru to identify the duplicates.
Hope this helps!
Also, remember the strategy for data backup. 3-2-1. At least 3 total copies of your data, 2 of which are local but on different mediums (devices), and at least 1 copy offsite.
Sorry for your loss. I had similar usecase where i was dumping data from multiple source and there are 1000's of duplicates.
Here are the options you have -
The DSM storage Analyzer report - this is very good interms of finding duplicates, however, not good for cleaning up. meaning you cant take bulk actions and other limitations
3rd party products - the one i use is https://dupeguru.voltaicideas.net/ issue with these products is that they dont run natively on the synology and you would run this on a machine in teh network. it takes a some of the bandwidth. The dupeguru is very good. read the docs on how to use it.
custom scripts - if you are a techie (know python and how to modify it) - there are bunch of custom script options. Here is one I wrote for myself. https://www.reddit.com/r/synology/comments/f09w5w/duplicate_file_finder_python_script_with_great/ It runs locally on the synology and I was able to eleminate most of duplicates. I had trouble with images that were scaled down when being copied to diff devices or downloaded from google photos. For this one, i narrowed down the directories where the potential duplicates are and then used dupeguru to identify the duplicates.
Hope this helps!
Also, remember the strategy for data backup. 3-2-1. At least 3 total copies of your data, 2 of which are local but on different mediums (devices), and at least 1 copy offsite.
dupeGuru wird immer wieder empfohlen.
Erfordert ein wenig Einarbeitung. Das Prinzip, dass die Dateien am besten in verschiedenen Ordnern liegen und man dann einen Ordner zur Referenz machen kann, alles die die Referenz duplizieren löscht oder verschiebt. Das ist okay für einige, für mich aber nicht nützlich weil meine Duplikate meistens im selben Ordner liegen.
Ich hab mir vor Jahren mal https://www.duplicatecleaner.com/ gekauft.
Die Free Variante funktioniert aber auch problemlos für echte duplikate. Also 1:1 Kopien. Die Pro Variante kann auch Duplikate finden die verkleinert wurden oder sehr ähnlich aussehen.
Gibt's als Windows Store app: https://www.microsoft.com/en-us/p/duplicate-cleaner-free/9nblggh4rrr3
Oder auf der offiziellen Seite die Pro runterladen und ich meine nach der Trial Periode wird's zu einer Free Version.
I used to use Gemini on Mac. Switched to DupeGuru because it‘s free and does pretty much the same as Gemini 2, including finding duplicates that aren‘t 100% the same (for example different dimensions or filesize). I know it‘s not iOS but maybe it‘s still useful to someone.
How safe is it to use dupeGuru?
Very safe. dupeGuru has been designed to make sure you don’t delete files you didn’t mean to delete. First, there is the reference folder system that lets you define folders where you absolutely don’t want dupeGuru to let you delete files there, and then there is the group reference system that makes sure that you will always keep at least one member of the duplicate group.
Now even though you didn't say it, I feel like it would still be expected that none of the images are duplicates. I only saw the one though when I first zoomed in directly to the left of her ear.
I don't know how you made this but if it involved you having photos on your computer that you uploaded I'd suggest the program DupeGuru It's what I use to find duplicate images in my collection.
Dupeguru! I use it daily for this exact sort of thing. Use the standard mode, not the picture mode(which can identify duplicate/silar pictures when the filesize/resolution are different. Set folder one as reference.
I downloaded all of the files to a hard drive big enough to hold them and have been using dupeGuruto deduplicate hundreds of gigs of duplicated files. It can be a bit finicky, but I've run its picture mode successfully on folders with 900,000+ images in them. As well as the standard mode on document folders.
Just ran across a program called dupeGuru. Havn't tried it out yet, so I have no opinion:
>is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same. >is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same.
https://dupeguru.voltaicideas.net/ https://github.com/arsenetar/dupeguru/
DupeGuru does this. https://dupeguru.voltaicideas.net/
dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same.
I've used it in the past and it works well.
Same boat as yours, OP. Looks like LIRE is our best bet.
I think we need to set up a community of art data hoarder somewhere eventually. Maybe set up a DC (Direct Connect) server for that.
Regarding dupes, I highly recommend using DupeGuru which works really well and is multi platform: https://dupeguru.voltaicideas.net/
I will try that as well. I also found this one.
https://dupeguru.voltaicideas.net/
Do you know if finding duplicate pictures would work the same for duplicate videos?
In an attempt to centralise all my drives I’ve acquired a DS918. In the process of copying all my data to it. I know for a fact that there will be a disgustingly amount of dupes. Reading through the Synology documentation I’ve come across SystemAnalyzer.
Among other features it does report on dupes:
> Duplicates: Displays duplicated files found in the system for the convenience of organizing or deleting unnecessary copies. Please note that the greater the maximum number of duplicated files you set, the longer it will take to process.
If this works I’d love to find an equivalent for my N40L.
What OS are you using? If you’re using Linux or OSX dupeGurumight be what you’re after.
Edit: dupeGuru is also on Windows apparently.
Dupeguru is pretty good.
Duplicate filenames don't seem to be your issue but it can also scan contents, which is one better than filesize if a little slower.
It also has music and picture modes that do intelligent detection to compare files that are not the same in filename or size but have similar content, i.e. sound or look similar.
It is possible to select how similar you want the matches to be too.
As for usability, it's advanced beginner to intermediate. It doesn't hold your hand and there are plenty of bells and whistles to amuse or confuse.
I can't remember whether I found it in the Software Manager or whether I got it elsewhere (PPA or .deb file), but check Software Manager first.