For file transfer where network speed is the limiting factor, use <code>rsync</code>. It optimizes compression for block-wise transfers, and also does per-block incremental updates so that only those parts of files that have changed get transmitted. Rsync
also transmits via ssh
, so it's as secure as (crude) utilities such as scp
.
To speed up local transfers over ssh
, change the cipher. Either base arcfour
or blowfish
should be faster than the network. Some CPUs have onboard aes
support, and are therefore damn fast with aes
-based ciphers.
However, speed and security are often inversely proportional. Caveat emptor.
Backup solutions are more about matching technology to the scale you need to backup, rather than being bioinfo focused. For example, we have about 25 petabytes of data right now, so our automated nightly tape backups and RAID in striped parity mode are going to be overkill for you.
For just a few disks needing backup, (and probably just some of the folders on those disks)... a typical one is an rsync cronjob.
The first example shows how to do 7 days of backups. (I'm not sure if you meant you want 7 days of history or just once a week). http://rsync.samba.org/examples.html
rsync is nice because it attempts to only transmit the differences since last time you backed up, so it should save lots of transfer time.
Here is one of the first google results on using the cron: http://www.unixgeeks.org/security/newbie/unix/cron-1.html
You can get an external harddrive and dump to there, or any other networked computer that you know the IP or DNS name of.
Check out: www.rsnapshot.org.
Rsnapshot does rolling, incremental backups using <code>rsync</code>. Copies the files that have changed, makes hard links to previously archived copies of ones that haven't.
Also checkout “Make your own Wayback Machine in GNU/Linux with rsnapshot” for hints on how to automate the whole process.
You can backup permissions too:
Some of the additional features of rsync are:
o support for copying links, devices, owners, groups, and permissions […] rsync -avz foo:src/bar /data/tmp
This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine. The files are transferred in "archive" mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.
But AFAIK OS X' default rsync
(outdated version 2.6.9) doesn't support ACL, wich seems to work in newer versions, for example mine from MacPorts:
rsync version 3.1.1 protocol version 31 Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, no prealloc, file-flags, HFS-compression
And yet, from the man page:
>Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option's before-the-transfer "Does this file need to be updated?" check.
Also,
>... in my research I found this: https://unix.stackexchange.com/a/66702
The supposedly damning evidence given on that page is: “Both commands take about the same amount of time, therefore rsync cannot possibly be doing the checksum—since that would involve re-reading the destination file off the slow disk.”
But the ‘How Rsync Works’ page explains: “The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.”
So, rsync
doesn't have to re-read the file to calculate the checksum since the receiver has been reading it as it arrived.
Or not. I'm wondering if I can suss something out of the source code.
By default, Time Machine excludes all external drives from its backups. Supposedly, you can remove that exclusion, and then the contents of your external drive will be included in your Time Machine backups. (Caveat: the external drive must be formatted with an HFS+
filesystem.)
If you're the more hands-on type, the obvious choice would be to just use <code>rsync</code> to copy things from the external drive to the internal drive. That's what it's made for. (Having it run automatically would apparently involve launchctl
, which I have yet to learn anything about except that it exists and does crontab
kind of things.)
A simple solution would be to use something like rsync. All you'd have to do is run a shell script when you wanted to copy over the files.
#!/bin/sh
DEV_WEB_ROOT="/var/www/dev" PROD_WEB_ROOT="/var/www/prod"
rsync --progress --stats \ --recursive --times --perms --delete \ --exclude 'configfile1.php' \ --exclude 'configfile2.php' \ $DEV_WEB_ROOT/* $PROD_WEB_ROOT
This is assuming your server is running on linux and has rsync installed.
Recommend using an actual rsync daemon, tunneling it through ssh adds quite a bit of overhead(though it's still a huge improvement over normal copy/ftp).
rsync -avu --progress --stats file1 rsync://username@server/rsync_module/directory/file1
Or for an entire directory(recursively): rsync -avu --progress --stats directory/ rsync://username@server/rsync_module/directory/
If you want to keep multiple copies on the remote end check out --fuzzy http://rsync.samba.org/ftp/rsync/rsync.html
Just the one from apt on ubuntu 16.04.2;
station@Station:~$ rsync --version rsync version 3.1.1 protocol version 31 Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, prealloc
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details.
rsync version 3.1.1 protocol version 31 Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/
That's an actual command-line computer program for copying files
How are you transferring the files to the NFS export? If you use rsync
, it creates a temporary file in the destination folder, and writes to that file. When done with the transfer, it renames the temporary file to the target filename. (Re: samba.org/how-rsync-works.). There's also the option to set a directory other that the final destination for the temporary file: "-T
" (or "--temp-dir=DIR
").
I'm guessing that if you transfer the files with rsync
, you can convince the host to ignore the temporary files, and therefore only move ones that are done transferring.
>I've looked into bittorrent, but it seems to always be a many step process (set up tracker, create and announce torrent, add torrent on target machines, etc..)
Is that bittorrent-based file-syncing, or just standard file sharing?
>I could just copy it sequentially to every machine, but the uploads are pretty bad, so I'd take a long time.
Ye olde <code>rsync</code> is optimized for low-bandwidth transfers. It does block-wise incremental updates, and on-the-fly compression to minimize the amount of data transfer. (At the expense of CPU load; TINSTAAFL.)
However, rsync
doesn't do "swarm" transfers; that's a bittorrent thing. You'd have to sync each machine individually. (Though, I suppose you could use rsync
to build a list of updated files and feed that to some swarming-protocol-based sync software.)
Go read the <code>rsync</code> documentation and stop trying to re-invent the wheel. Every obscure scenerio for data lossage you can come up with has already been account for in rsync
.
Is this external drive using a USB connection? Why don't you just connect it to the RaspberryPi, and leave it there?
Then you can just create your cron
job to run rsync
on the RaspberryPi itself, and let it do it's thing irrespective of where your laptop is or what it's doing.
If you do this, you probably want to add the flag -e "ssh -c blowfish"
to your rsync
command. That will specify the encryption algorithm that openssh
uses, and blowfish
is a lot less abusive to the CPU than 3DES
or whatever the default is.
For local transfers, say from your laptop to the RaspberryPi, use rsyncd
to bypass openssh
entirely. Just use two colons instead of one:
rsync remotehost::module/path/file /local/path/
You define the module
in rsyncd.conf
. See the man page for details.