The link to youtube is someone running fdupes. Took me a minute to figure that out. I want my minute back.
[mike@orion ~]$ dnf info fdupes Using metadata from Fri Apr 10 06:04:55 2015 Available Packages Name : fdupes Arch : i686 Epoch : 0 Version : 1.51 Release : 8.fc21 Size : 30 k Repo : fedora Summary : Finds duplicate files in a given set of directories URL : https://code.google.com/p/fdupes/ License : MIT Description : FDUPES is a program for identifying duplicate files residing within specified directories.
If using Linux or freebsd, take a look at fdupes. It compares files by size, then hashes files identical in size to see if they are the same.
If you compile it from source you can also have it make hard-links for all your duplicates. That won't sort out your organisation problems, but will save a tonne of space!
fdupes:
is an excellent file de-duplication tool.
Adrian's repository lives here: https://github.com/adrianlopezroche/fdupes But there might be other ways to get it too. I have it on my Linux and Windows boxes. For linux I run it natively, but for Windows I use it under CygWin. It works flawlessly and has many modes.
You can use RSYNC
in dry-run
mode to give you "comparison" results instead of actual syncing. Both can be automated using CRON (Linux) or Task Scheduler (Windows). But neither of these approaches are going to eliminate the decryption cost.
As to continuous automated way to index changes, sorry I haven't looked into that. I use these as a manual one-off comparison method to eliminate file duplicates.
I'm sure there must be products out there, esp. on Windows that provide what you are looking for, so if you find something that works please share with me as well.
Look into using Radarr. I just started using it so you could probably create a profile to do that.
Also used this Linux program in the past. It’ll search for duplicates and delete. As always, make a backup and test it out before you apply to actual data.
Nice. Too bad it doesn't support Python 3.
It looks like fdupes does something similar. Specifically, if I read the code correctly, it first compares the size, then the MD5 hash of the first 4 KiB, and finally the hash of the entire file.
My command above obviously doesn't do that, but again, I only used that when I couldn't install something better.
Create a few test dirs and files and see if you can reproduce it, then file a bug report https://github.com/adrianlopezroche/fdupes
I never used it myself but considering this tool has been around for many years and has been featured in many magazines, tutorials and so on, I'm inclined to think that it's not just careless programming.
My thinking is that I'm not entirely sure what ./ expands to here. I guess that also depends on what shell you use. I mean ./ could include . again.If you do ls -a ./ the output includes . and .. See what I mean?
You will need to wipe the drives at some point. Though you don't need to wipe them all at the same time.
With different sized disks your only choice is ZFS or BTRFS. Your larger drives will have wasted space just because of "reasons". I try to sell your older drives to get funds to have larger drive. You will need a drive to park your data while you build your ZFS raidx.
You can try fdups. Haven't used it myself but should work. https://github.com/adrianlopezroche/fdupes If you are like me I would just write s script that does sha512 checksums.
more ram won't help you.
edit: forgot to mention btrfs. If I where you i would get rid of the ide hdds, and also disassemble the usb hdds and mount them directly into your pc. They are just normal hdds wrapped in a box.