Lots of missing chunks

Christoph     Jan 24 4:34PM 2018

I wrongly included a number of directories in my backup and so I deleted the first snapshot so as not to waste storage space. But this is what I got:

PS C:\> duplicacy prune -r 1
Storage set to P:\Backup\duplicacy\
Deleting snapshot PC_C at revision 1
The chunk fa032a705d77485e3bb723a6b4c70031fb317a3bb2108d1d4fd8dc2546bda4b1 referenced by snapshot PC_C revision 1 does not exist
The chunk e94da0eaa7348446db8b9e09110589e28cf4a5f7b6201675e96e794746451c34 referenced by snapshot PC_C revision 1 does not exist
The chunk db923a3b24bef6ebc5c5147444ab89f54ab93f6aa053616566b47f785b342c1d referenced by snapshot PC_C revision 1 does not exist
The chunk db5d3655f9540b6f237871514c3e4df8dabfb4882a4408e3b3e6531540774043 referenced by snapshot PC_C revision 1 does not exist
The chunk 00a4cb18ae8cd0a545478d3a712ce834e75a5e00f9c4362224b09bb2964908f2 referenced by snapshot PC_C revision 1 does not exist
The chunk 613ba29088a47433b6b087fce1f3b50b1267bc34d0fea49bfc766c41f49dcdd4 referenced by snapshot PC_C revision 1 does not exist
The chunk d578de0434934facca975f339d148d730d35bbc7fe1227564e721b78d19d6193 referenced by snapshot PC_C revision 1 does not exist
The chunk 1957ee5b96c1605d63a14f323f09af228e5ec1cab5086a68290a9050dbba9c6d referenced by snapshot PC_C revision 1 does not exist
The chunk 164946d9c06633c354859d205bb15387967bf336760f4aa00b5a5ce5bacd8102 referenced by snapshot PC_C revision 1 does not exist
The chunk 044371779e783dd993f79e1385b666f434f3485dd3064b1080019b8622c73d7a referenced by snapshot PC_C revision 1 does not exist
The chunk a53a7fd55951ab4f096a683ee9bda79593425c008df625e14d9f975eae14cfad referenced by snapshot PC_C revision 1 does not exist
The chunk a53a7fd55951ab4f096a683ee9bda79593425c008df625e14d9f975eae14cfad referenced by snapshot PC_C revision 1 does not exist
[...]

There were hundreds, if not thousands of missing chunks.

Doesn't that mean that if I had relied on that first revision to restore something, chances are, it would not have been possible? Is this the kind of failure that duplicati's database is supposed to protect against?


gchen    Jan 24 10:55PM 2018

Here is a detailed explanation of what to do when there are missing chunks: https://github.com/gilbertchen/duplicacy/wiki/Missing-Chunks


Christoph    Jan 25 3:43PM 2018

Thanks! I take that as a "yes" regarding the last question ;-)

Unfortunately, I can't check directly on the storage whether the chunks ever existed there, because if they were there, they would have been deleted by that very same prune action (actually, not exactly the one in the OP but I ran the same in exclusive mode shortly after).

What I did do, however, is check if the chunks mentioned in the OP are in pruning log and they are not. So I suppose that means that they were indeed missing. I also checked directly on the storage whether they are there, but they are not, at least not in the folders where they should be.

I can also say that the prune command in the OP was the absolutely very first time I ever ran a prune command, so there is no way how previous pruning actions could have deleted those chunks. Or wait: that pertains to the CLI. I did have pruning enabled at some point when testing the GUI, but that was for a different repository. Plus: that repository has never completed it's initial backup, which means that it has never actually pruned either.

So what does that leave us with? Ah, the two edge cases:

If there is a backup taking more than 7 days which started before the chunk was marked as fossil, then the prune command will think that that repository becomes inactive which will be excluded from the criteria for determining safe fossils to be deleted. The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune command doesn't know the existence of such a repository at the fossil deletion time, it may think the fossil isn't needed any more by any snapshot and thus delete it permanently.

Well, there are certainly longer time periods involved in my testing and the repository is indeed newly created, but if I understand those edge cases correctly, they require that the two repositories have chunks in common. In my case, I don't believe they have files in common. While that doesn't exclude the possibility that they do have chunks in common, it is highly unlikely that they would share hundreds of them. Plus, as mentioned above, the other repo did not do a prune anyway.

So what's the conclusion?

What if the missing chunk can't be found in any of these prune logs? We may not be able to track down who the culprit was. It could be a bug in Duplicacy, or a bug in the cloud storage service, or it could be a user error. If you do not want to see this happen again, you may need to run a check command after every backup or before every prune.

Maybe I can check with my cloud storage service (pCloud), their support is actually quite good. But what exactly should I ask them? Like: "Do you have any limitations regarding file names, number of files in a folder? Or should I just report to them that I believe they lost quite a number of my files? Seems kinda unlikely, though, that that will lead to anything...

I certainly do not want to see this happen again (what good is a backup program that turns out not to have my files , just when I need them?). So I think this is actually very good advice:

you may need to run a check command after every backup

So good, in fact, that I'm wondering: why doesn't backup do that by itself? Does the GUI do that? If this is the only way to avoid this kind of missing chunks problem, then I suppose it should at least be an option...

So how about a -check option for the backup command?


gchen    Jan 25 9:44PM 2018

Unfortunately, I can't check directly on the storage whether the chunks ever existed there, because if they were there, they would have been deleted by that very same prune action (actually, not exactly the one in the OP but I ran the same in exclusive mode shortly after).

I'm not sure about this. If the prune command reported a chunk is missing it would not be able to delete it.

Well, there are certainly longer time periods involved in my testing and the repository is indeed newly created, but if I understand those edge cases correctly, they require that the two repositories have chunks in common. In my case, I don't believe they have files in common.

This is right. If the prune command deletes a chunk that belongs to the new backup and when you check the new backup it will complain about the missing chunk.

Maybe I can check with my cloud storage service (pCloud), their support is actually quite good.

Wait, does Duplicacy already work with pCloud? There is a feature request for it: https://github.com/gilbertchen/duplicacy/issues/295

So good, in fact, that I'm wondering: why doesn't backup do that by itself? Does the GUI do that? If this is the only way to avoid this kind of missing chunks problem, then I suppose it should at least be an option...

I'm thinking a warning message at the end of the backup for an initial backup or if the backup takes more than 7 days.


Christoph    Jan 26 1:13AM 2018

Wait, does Duplicacy already work with pCloud?

pcloud comes with its own client which "mounts" the remote drive as a local one. So I'm technically backing up to a local drive until WebDAV is available in duplicacy for direct cloud access.


towerbr    Jan 26 6:38AM 2018

pcloud comes with its own client which "mounts" the remote drive as a local one. So I'm technically backing up to a local drive until WebDAV is available in duplicacy for direct cloud access.

I'm still not familiar with pCloud (I intend to test), but in this case the files are stored locally too (using space) or are they sent "directly" to the cloud, "through" the local pCloud drive?


towerbr    Jan 26 6:50AM 2018

Oh, I see now... You can choose the folder to sync... interesting...

image

image


Christoph    Jan 26 4:30PM 2018

Yes, but don't let that distract you from the plan of adding support for WebDAV, because using the pcloud client is really only a temporary solution...

As for the main issue in this thread, what can we conclude? When you're testing a backup program and one of the first things that happens is that thousands of chunks go missing, that is not very encouraging... I'd at least like to narrow down the possible sources of failure a bit more.... But how?


gchen    Jan 26 9:46PM 2018

I wonder if it has something to do with the pcloud client, especially when there is a cache between Duplicacy and pcloud.. The files Duplicacy saves to pcloud may be kept in the local cache for a while, and other computers won't be able to see it until they are uploaded.

PS C:\> duplicacy prune -r 1
Storage set to P:\Backup\duplicacy\
Deleting snapshot PC_C at revision 1
The chunk fa032a705d77485e3bb723a6b4c70031fb317a3bb2108d1d4fd8dc2546bda4b1 referenced by snapshot PC_C revision 1 does not exist

If there weren't mutliple computers involved, can you check if the file is at IP:\Backup\duplicacy\chunks\fa\032a705d77485e3bb723a6b4c70031fb317a3bb2108d1d4fd8dc2546bda4b1 or P:\Backup\duplicacy\chunks\fa\03\2a705d77485e3bb723a6b4c70031fb317a3bb2108d1d4fd8dc2546bda4b1? If this was the first time you ran the prune command, I don't know how it can be missing.


Christoph    Jan 27 2:13AM 2018

There is only a single computer involved. But, yes, maybe we can somehow blame it on pcloud. I'd have to take it up with their support.

But I should also say that I've been a bit "rough" with duplicacy, i.e. I didn't worry about killing the task and I have used both the GUI and the CLI on the same repositories, including, I think, this one with the missing chunks. Might this be a possible source of error?


gchen    Jan 27 9:49AM 2018

But I should also say that I've been a bit "rough" with duplicacy, i.e. I didn't worry about killing the task and I have used both the GUI and the CLI on the same repositories, including, I think, this one with the missing chunks. Might this be a possible source of error?

Unlikely. When you kill a backup process, Duplicacy will leave many unreferenced chunks on the storage. Every chunk that has been uploaded will be complete and can be skipped when you run the backup again. The one that is being uploaded when you kill the process will not be corrupted either, because Duplicacy always uploads the chunk into a temporary file (for disks and sftp server) and renames the file after the upload is complete.


Christoph    Jan 31 3:00PM 2018

Today I found another two missing chunks in an newly created backup:

PS D:\christoph\work\Dropbox> duplicacy check -storage pcloudtest
Storage set to P:\Backup\duplicacy-test\
Listing all chunks
Chunk 1daf0aa39f4f4add3fdbf2b0c8c751a6fef04ecf525b4417c772626006ed8271 referenced by snapshot dropboxtest at revision 1 does not exist
Chunk 33763fd682cb2bd4957d3a2702a778df0c26c49bab778475de3a12ac229e7ff8 referenced by snapshot dropboxtest at revision 1 does not exist
Some chunks referenced by snapshot dropboxtest at revision 1 are missing
PS D:\christoph\work\Dropbox>

Besides, I have had the pcloud client crash on me several times during the day (but the missing chunks are not from today's backup, they are from yesterday's initial backup). So things are quite clearly pointing to the pcloud cient as the source of the problem. I just contacted pcloud support about that.

But I'm looking forward to duplicacy supporting webDAV as it will allow me to circumvent the pcloud client altogether.