Today, one of my tasks was identifying and transferring approx 403k MP3 recordings from an office in the states over a VPN to a local NFS mount here in London. Each file is approx 4MB (128kbps files), so a total of ~1,6TB of data.
I tried a few test copies with rsync and scp but both were giving me ridiculous transfer rates, taking ~3mins to transfer 10 test files even though each file individually was achieving around 2.65MB/s. With that transfer rate I should have been able to copy 10files in about 15 seconds. I attribute the extra time to the connection setup and tear down for each file coming through. No way, thats gonna work.
I’ve had several bosses over the years who’ve been complete Unix masters – One of them showed me the wonders of DD – supposedly meaning “Data Description” but also known as “Disk Destroyer” because of how, if you get your Input and Outputs wrong, it can easily (and quietly) destroy your data. (info here).
DD is used for low-level block copying of data, but like many unix tools is quite versatile. My old boss used it for creating files of any specified size, and then timed how long it took to copy across a network link, thereby giving a measurement of transfer rate, which is how i was using it today.
To create a file of 1GB in size:
dd if=/dev/zero of=./1000MB bs=1024 count=1024000
(/dev/zero is a special Unix file which will return as many null characters as you ask it for. It’s of the same family as /dev/null and /dev/full)
Back then, we were using FTP, but these days SCP does actually tell you the transfer rate:
thorsten@invincible:~$ time scp 1000MB foreignhost:
Scp tells me transfer rate was 10.0MB/s however real time says it tooks 1m54sec, so my calculations actually say 8.77MB/s
So, 403k x 4MB = 1612000MB. With a transfer rate of 8.77MB it should take 183808.43 seconds = 3063mins = 2.12 days.
Just over 2 days is acceptable, it is a decent amount of data, and certainly better than 3mins for 10MP3s which would be, what, 121083.3 mins=84 days? phew! Thankfully in this case, there’s a decent amount of space on both servers, so I can tar up the required files up and do a single rsync and hopefully should be able to use the full bandwidth.