I've been experimenting with rmlint lately, with an eye toward automation. So far things are looking promising, but I have to spend some more time with it. Check it out if you're looking to do things in an automated way.
I find rmlint pretty powerful.
Especially because you can flag original directories, which can be very convenient if you have a master copy/multiple drives.
Check or this page and especially this example:
https://rmlint.readthedocs.io/en/latest/tutorial.html#flagging-original-directories
I don't have a bunch of time atm, christmas, but I just wanted to link you to rmlint
I used this to find duplicates over 60TB, rmlint itself did not delete/modify anything it just output to a bash script/json, that I then scripted around to fit my needs.
Could be useful. Feel free to PM me if you need help writing something, for some reason I like doing stuff like this.
You've reminded me this is something I wanted to figure out again. I'm trying using rmlint on my Solus Linux desktop to search for dupes on a shared folder that's mounted over SMB.
I just removed 10tb of dupes last weekend. Well I replaced them with hardlinks. I used rmlint
https://github.com/sahib/rmlint https://rmlint.readthedocs.io/en/latest/tutorial.html
rmlint wont remove anything but it will tell you whats duplicate and you can run it with a crazy amount of options to get what you need. Or just tear apart its output to get what you need.
I had rmlint output to a bash (.sh) script which included hardlinking. After sitting with that file for like 2 days randomly throwing regex searches at it trying to find if it caught something it shouldn't have I ran it and it did exactly what I wanted.
It also outputs a json file you could query to get what you want. First time using it and it was gud.
It's here, it can do (in order starting with what would be best for most users if the filesystem/kernel can do it): clone, reflink, hardlink, symlink.
So I am unsure due to what I suspect are mistypes but it seems like you are trying to insult me? Saying I have not written 5 lines of code? Not sure but it sounds like you are angry. I am not sure why my previous post got you upset but I apologize if it did.
Looking at rmlints docs here. https://rmlint.readthedocs.io/en/latest/ I don't see an option for it to replace duplicate files with symlinks to the original. I could easily be missing it because I only skimmed this page. This was a requirement from OP.
IIRC, the order in which you specify the directories to search defines which file is the “original” and which gets deleted.