checking integrity of snapshots

kevinvinv     Nov 27 9:27PM 2017 GUI

This is probably not possible but I thought I'd ask.

1) Is there any way to check the integrity of the backup on the server side (in the case of an sftp storage)

2) Is there any way to check the integrity of the backup from the gui?

Thanks very much!!!


gchen    Nov 27 9:53PM 2017

You can run the check command on the server side. You just need to create an empty repository on the server and add the storage as a local one.

The GUI version doesn't support the check command unfortunately.


kevinvinv    Nov 28 9:29AM 2017

Oh cool- I'll see if I can give it a shot.

Thanks!!!


kevinvinv    Dec 3 12:53PM 2017

I am struggling a bit with understanding how to check the integrity of my backups

In another thread I saw a mention of -verify but I am not seeing that command spelled out in the wiki documentation.

I now understand that -check just checks that each chunk in a snapshot actually exists.

What is the most robust way to try and verify that a particular snapshot (and its chunks) is not corrupted in any way?

Would I run the -check command with the -files option? Again- I am not sure where -verify is or how to use it.

So sorry if these are irritating or annoying questions. Thanks for your hard work.


kevinvinv    Dec 3 6:50PM 2017

UPDATE:

Here is what I am trying:

I want to verify the backup integrity of one of my users on the server side.

I have created an empty repository and done a duplicacy init command to add this new empty repository to the storage.

Next I have done duplicacy -check -all and sure enough- all the snapshots are quickly listed and no missing chunks are reported

Now I wish to check that each chunk is valid.. so I think what this means is that the chunk needs to be downloaded and a new hash calculated and compared to the chunks filename which as I understand it... is the chunk hash. Is this right?

So I have run duplicacy -check -all -files to do this

After entering the storage password I see "Listing all chunks" and then nothing seems to be happening.

Am I doing something wrong?

Thanks!!!!


gchen    Dec 3 8:09PM 2017

duplicacy -check -all -files is the right command to run if you want to verify all backups. Listing all chunks may take a while, especially if this runs against an SFTP server. Running this command on the server should be much faster (by initializing an empty repository on the server with the local storage).


kevinvinv    Dec 4 12:34AM 2017

Looks like it worked... just took awhile. No problem.


kevinvinv    Dec 4 9:22AM 2017

1) so if a corruption is detected from duplicacy -check -all -files... what should a person do... just delete the associated chunk and let it re-upload next time?


gchen    Dec 4 2:25PM 2017

That could be caused by a corrupted chunk, or a Duplicacy bug, or even a hash collision. If that happens, please let me know and I'll help you get to the bottom of it.


kevinvinv    Dec 4 6:20PM 2017

Do you think it is a good idea to periodically run the -check -files command to verify that the data in the storage is not corrupted?

I was planning on trying to run that every week or so (server side) for my 10 users... is that a good idea or ?

The main problem with doing it is that I have to have their passwords... so that might be a problem.


gchen    Dec 4 7:51PM 2017

-check -files basically downloads every file so it may be slow and the traffic may cost a lot, but if it is your own server then it should be ok to run once a week.

Yes, you need to have the storage passwords, but I think it may be possible to write a simple script to make sure that the hashes of chunks do not change after they have been uploaded, and this may be enough to detect most corruptions and still doesn't require the storage passwords.


kevinvinv    Dec 4 8:51PM 2017

OK I can think through that.

What I was sort of thinking would be ultimate was to do -check -files and then auto-delete any bogus chunks (and report them) so they would get re-uploaded if needed by the next backup snapshot.

I'll see what I can think of.

Thanks again!


kevinvinv    Dec 8 8:31AM 2017

Update and question:

I can definitely see the benefit of a -verify command. It has taken 5 days with -files to verify one snapshot for my largest volume user. This is running on a quad-core intel QNAP... This is not a complaint- just a statement :)

One thing that would be nice is to have duplicacy print out a little status message saying it is actually "checking" things. What it says is "Listing all Chunks" and then for 5 days it says nothing else :) Maybe a message saying "This might take awhile" would reduce anxiety :)

Finally- a question-- on the password thing -- is there any way to enter the password on the command line and not have to type it in interactively? I read the password management section and wasnt quite sure what to make of all that yet.

Thanks very very much!


gchen    Dec 8 12:18PM 2017

You can always store passwords in environment variables to avoid entering them every time. The password management lists the name of the environment variable for each type of password. If you run the CLI version on Windows and macOS, you only need to type the password first time, which will then be saved in an encrypted storage and retrieved later automatically.