Franco Jan 25 7:20PM 2018 CLI
I'm trying Duplicacy with the final goal to perform backups on a remote sftp server. For testing purposes, I've created an encrypted backup using duplicacy CLI on a Linux x64 machine with a local USB drive as storage.
In order to test various environments, I've then attached the USB drive to a Windows 10 x64 machine and created here a repository using again the USB drive as storage, then I attempted to restore the previously created backup. What happens is that I'm asked for the encryption password, I type it in, then the restore process tries to start but suddenly stops with the message "Failed to decrypt the chunk f69617df28e40f624fadd7ce7efbf085043b0e8d5d6ac91eeee7cc21c00b0b51: cipher: message authentication failed; retrying", it retries four times then aborts.
I'm sure the password I've entered is correct, if I try to type an incorrect password I get immediately this message "Failed to download the configuration file from the storage: Failed to retrieve the config file: cipher: message authentication failed", which is similar (the password isn't recognized), but here it fails immediately, while giving the correct password it fails when attempting to restore the contents of my backup (while decrypting the chunks).
So my guess is that backups created under Linux are incompatible with Windows (and vice versa). Is this true? If not, what else can be the cause of my issue? Can it be solved?
Franco Jan 25 7:23PM 2018
As a side note, if I search for a file named "f69617df28e40f624fadd7ce7efbf085043b0e8d5d6ac91eeee7cc21c00b0b51" in the chunks folder on the storage drive, I can't find any file with this name. But there's a file named "9617df28e40f624fadd7ce7efbf085043b0e8d5d6ac91eeee7cc21c00b0b51" into the folder "f6" (which concatenated gives the name seen on the error message). Just trying to make some hypothesis, maybe it's a problem with directory nesting differences between platforms (i.e. the use of slash vs backslash)?
gchen Jan 25 10:12PM 2018
Failed to decrypt the chunk f69617df28e40f624fadd7ce7efbf085043b0e8d5d6ac91eeee7cc21c00b0b51: cipher: message authentication failed; retrying
This is a clear indication of a corrupted file. What is the disk format? The Windows disk driver may be the one to blame.
If you're running 2.0.10, the path "f6/9617df28e40f624fadd7ce7efbf085043b0e8d5d6ac91eeee7cc21c00b0b51" is right.
Franco Jan 26 2:48AM 2018
Yes, I'm running 2.0.10. You seem to be right, I didn't tried to restore the backup on the Linux machine from which it was created, now I've tried and it fails even there. So you think the problem is the Windows disk driver on Linux?
Franco Jan 26 11:14AM 2018
Well, I've successfully tested a relatively small backup (30GB) made on Windows and restored on Windows.
So, taken for good that the problem I had yesterday was caused by the Linux fuse/ntfs3g drivers (well, I'm using it for years now almost without problems, but sometimes something strange occurs, so yes maybe these drivers are somewhat faulty and maybe some stressful conditions can push them to the limits...), I'm now going to create the initial full backup of my data directly on Windows, using the USB drive formatted in NTFS as storage.
Started with a clean condition, the backup has gone well for about 30 mins, then this happened: "Failed to upload the chunk 4b4396347f70dbc40af123eca155b2b2de3dfc5115e35d3fe3f03fac1f59060d: write g:\/chunks/4b/4396347f70dbc40af123eca155b2b2de3dfc5115e35d3fe3f03fac1f59060d.njzpygyh.tmp: Impossibile trovare il file specificato. Incomplete snapshot saved to u:\Foto/.duplicacy/incomplete" ("impossibile trovare il file specificato" translates to "the specified file could not be found").
The file actually isn't there... so now what's going wrong? NTFS disk on Windows 10 x64... I'm stalled...
gchen Jan 26 9:01PM 2018
When uploading a chunk, Duplicacy first writes to a temporary file and then rename the temporary file with the correct name. This looks like the temporary file was created successfully but then disappeared when Duplicacy attempted to write to it. Could it be that an anti-virus program removed the file? What if you run the backup again?
Franco Jan 29 5:41AM 2018
I don't have antivirus programs on that PC, not even Windows Defender (forcedly disabled through local group policies). I tried to run the backup three times, each time restarting from a clean condition.
I made two attempts backing up a folder (about 350GB of data) and it failed both times in different places (I mean, first attempt after about 30 mins / 70GB of data - the one I reported above - and second attempt has gone further, failing after about 50 mins / 120GB). In order to exclude a problem on the source, for the third attempt I started from a different folder (different source drive, smaller content), but even then it failed after less that 20 mins the same way as before.
Then, as a last chance before throwing all away, I decided to reformat the storage drive using exFAT (remember, it was NTFS before), then retried again with the first source folder and it finished without errors! Backed up a total of 4 folders (about 900GB of total data) so far without problems...
So, have I to blame NTFS itself, or what? :D
gchen Jan 29 1:39PM 2018
This is weird. I guess because you used this USB drive for both Windows and Linux, some incompatibility issues somehow corrupted the file system (more likely the Linux NFTS drive's fault). If you reformat it with NFTS it might work fine but I agree exFAT is adequate and may have a better compatibility for this purpose.
Franco Jan 30 5:10AM 2018
It's weird, yes. I can assure you that the NTFS filesystem was OK, for each test I restarted from a clean condition, reformatting each time the storage drive (actually between the second and third test I also recreated the whole partition - paranoia, you know? :D ).
So, well, can it be something in the way Go accesses the filesystem? Maybe there's some timing problem affecting very fast operations on the filesystem. This is not pure fantasy... I remember years ago I was testing a program that was writing a lot of files to disk (really a lot in a very short time) and sometimes there were problems (I can't remeber what problem exactly). At the end, I solved introducing a short delay between the creation of each file. I thought it was stupid, but it worked... BTW, this program is still running here where I work :) ...it would be curious to test what will happen now if I remove that delay, after years of operating system and hardware upgrades :D
gchen Jan 30 12:58PM 2018
The code there is so simple I can't imagine there can be anything wrong:
When I searched for "impossibile trovare il file specificato" this came up: https://github.com/pbatard/rufus/issues/690
Could it be a bad sector for you too?
Franco Feb 1 4:58AM 2018
I examined the code and yes, it's quite straightforward. My program which failed occasionally was as simple as yours... there's something out of the programmer's control, I'd say some timing problem in the filesystem (unlikely) or in the USB driver or controller (more likely).
Yesterday I made a new backup on an NTFS disk, same source as the previous tests, but this time the storage disk was an internal SATA one: 300GB of data without problems.
The variables here are: destination disk and disk's bus. I'm now completing a full surface test on the USB disk (btw it's a new unit, so I hope it has no bad sectors...). Once the surface test is complete, I'll run a stress test that mimics what duplicacy does, let's see...
Franco Feb 1 6:39PM 2018
Ok, I'll expose my last experiences just to let you know. It's quite clear now that the issue is not in duplicacy.
Surface test was OK as expected. So, new partition, new NTFS filesystem, and here we go.
Tried another backup and it failed three times in a row. First and third time like my preceding report ("the specified file could not be found"), and the second time with a new entry: "GetFileAttributesEx: access denied", well, yes, it was attempting to read attributes for a file in a folder that doesn't existed on the disk.
This sounds definitely as a buffering+bus driver/controller problem (my guess is that windows thinks the file was created because the driver reported so, but actually the request is still buffered and something went wrong when flushing the buffer to disk). Probably NTFS is more demanding in terms of performances, in fact the transfer rate is almost double compared to exFAT, and this probably hits some limits (or bugs) in the USB drivers/controller.
As a confirmation of this, I taken a look at the windows' system events and there are a lot of errors related to the external usb drive just some seconds before Duplicacy fails. This only occurs if the filesystem is NTFS, no errors using exFAT (at the cost of almost 2x backup time).
I'll try with another USB drive (different brand) just to exclude a design or production flaw in this particular drive model (btw, it's a 4TB Maxtor M3 portable external USB 3 drive).
Thanks for the support!
gchen Feb 1 8:25PM 2018
Thank you for sharing the details and taking the effort to get to the bottom of this! What you found can be very useful to other Duplicacy users.
towerbr Feb 2 6:14AM 2018
I also like to thank you Franco, very interesting and useful tests.
Franco Feb 6 5:45PM 2018
To complete my tests, today I received a new drive, this time it's a 4TB WD My Passport external USB 3, so same size and same host connection as before, just different brand. It arrives NTFS formatted from the factory.
I completed a backup of about 240GB (the same source as the previous tests) using this new drive as storage without any problems, neither from duplicacy itself nor from the windows' system events.
This confirms that the Maxtor drive was somewhat faulty... and since it's ok when using exFAT and that all SMART indexes are ok too, I think it's a design flaw, probably something to do with its USB controller.
So DON'T use Duplicacy with an NTFS formatted M3 drive as storage! ;)