Duplicacy Issue: Prevent multiple instances of duplicacy from running at the same time

All issues

Prevent multiple instances of duplicacy from running at the same time

SkyLinx Dec 17 10:58AM 2017 CLI

Hi! First of all I wanted to say a huge THANK YOU for this nice piece of software. I am testing it because, like many, I will be switching from Crashplan for reasons you know.

So I am currently backing up my (linux) desktop computer at home to Backblaze B2, and so far it seems to work great. One thing I was wondering though is what happens if I cron/schedule a duplicacy backup to run for example hourly or every few hours. Say one backup is taking longer than expected and the next scheduled backup is triggered before the first backup is done, what happens?

I know that duplicity handles this situation and prevents two instances from running at the same time (I've read it somewhere while researching Crashplan alternatives). Does duplicacy also handle this situation? If not, what happens if two backups with same parameters from the same repo and the same storage run at the same time?

Many, many thanks in advance for your help and for the tool :)

gchen Dec 17 4:32PM 2017

Duplicacy doesn't handle this situation. However, it won't lead to any error/corruption, maybe only some unreferenced chunks. Both backups will have the same revision number, since they are based on the same last revision. Uploading the same chunk twice won't be a problem, and the second one will be skipped because of the existence check before each upload. The only issue is in the last step of the backup operation when the snapshot file is to be uploaded. Since they have the same revision number and the snapshot file is named by the revision number, one snapshot file will be overwritten by the other. Thus, some chunks referenced by the earlier backup whose snapshot file gets overwritten will become unreferenced. However, if files do not change between these two backups, there won't by any such chunks.

SkyLinx Dec 18 12:34PM 2017

Hi gchen, thanks a lot for the explanation. Hopefully it won't happen anyway. Thanks!

EssexBoyRacer Jan 8 5:31PM 2018

If it helps here’s a script I modified to work with duplicacy that does just this.

I just call the script from Cron once per day but you can run it as often as you like and it should only ever allow one process of duplicacy to run.

You’ll need to update a couple of lines to reflect your configuration.

#!/bin/sh      

PIDFILE=/root/duplicacy.pid
if [ -f $PIDFILE ]
        then
                PID=$(cat $PIDFILE)
                ps -p $PID > /dev/null 2>&1
                if [ $? -eq 0 ]
                then
                        echo "Process already running"
                        exit 1
                else
                        ## Process not found assume not running
                        echo $$ > $PIDFILE
                if [ $? -ne 0 ]
                then
                        echo "Could not create PID file"
                        exit 1
                fi      
        fi
else
        echo $$ > $PIDFILE
        if [ $? -ne 0 ]
        then
                echo "Could not create PID file"
                exit 1
        fi
fi

cd /mnt
/bin/duplicacy backup -hash -stats -threads 50
rm $PIDFILE

gchen Jan 9 11:31AM 2018

Thanks. Your script can be very useful. Do you ming creating a wiki page under https://github.com/gilbertchen/duplicacy/wiki to put your script there?

Christoph Jan 21 9:06AM 2018

So one way of handling multiple instances is to prevent them via a script (and it would be great if someone could share an equivalent script for windows).

But let me explore the other option: not to care about multiple instances. (I like that option, because it simplifies things quite a bit and it can also speed things up if one instance isn't able to use all your upload bandwith.) It seems to me that what gchen says can be interpreted as "it doesn't usually matter how many instances are running", right?

If that is the case, we still need to take care of what the risks are (and see if those can be avoided other than preventing multiple instances). So: what are the potential problems?

To start with, let me clarify that the scenario in the OP is just one specific case of multiple instances:

two backups with same parameters from the same repo and the same storage run at the same time

Other possible cases of multiple instances (and my tentative interpretation)

same repo, different storage: no problems whatsoever
different repo, different storage: no problem whatsoever
different repo, same storage: tricky. No problem whatsoever if there are no shared chunks between the two repositories. Chunks can be shared either because the repositories overlap (i.e. some folders are included in both repositories) or because identical files (or file-parts?) happen to exist in both repositories. The case of overlap can be treated as identical with the OP scenario (i.e. same repo, same storage) because we definitely know that chunks are shared. I'm not sure about the the case of haphazardly shared chunks, but to be on the safe side, let's also treat it as identical with "same repo same storage".

If the above is correct, we can note that multiple instances of duplicacy are only a matter of concern if they are backing up to the same storage. (BTW: what about duplicacy instances running with something else that the backup command? I will leave that aside for the time being).

So now, what are the risks? gchen says:

some chunks referenced by the earlier backup whose snapshot file gets overwritten will become unreferenced. However, if files do not change between these two backups, there won't by any such chunks.

So there is another huge scenario, for which we can say multiple instances are no problem whatsoever: when none of the files (or file parts?) shared by the two backup jobs changes while those instances are running.

Now, what if they do? Do we lose data? Well, the data is there (the chunks have been uploaded), but they are "invisible" to any restore process, because they are not referenced by any snapshot.

That leaves us with two questions:

Can the data from the unreferenced chunks somehow be restored, assuming that the files in the repository have been destroyed for ever by a nuclear disaster?
Under what circumstances will those unreferenced chunks disappear?

I cannot answer question 1, but I think the answer to question 2 is: only when duplicacy prune -exhaustive is run. If that is correct and if there is a yes answer to question 1, then I am tempted to conclude that we can safely not care about running multiple instances of duplicacy, provided that we use duplicacy prune -exhaustive carefully, i.e. only when we know (but how to know?) that it will not delete potentially needed orphaned chunks.

Phew! - Forgive me for thinking out aloud at length, I thought it is the best way to clarify these questions and to identify and correct mistakes I might have made in the above.

gchen Jan 21 9:51PM 2018

Can the data from the unreferenced chunks somehow be restored, assuming that the files in the repository have been destroyed for ever by a nuclear disaster?

Technically, it is possible. Here is the actually content of a snapshot file:

{
     "chunks": 
["dded7f716811a1d2adfc22fa783803ae000af62a83c20fea65a29b4936b1159c", "0733f59a89445c4314175575e349aec6f56b61876b3d818c418826afcd2ef971"],
    "end_time":1516586444,
    "file_size":22174676071,
    "files":["851a248538a6630dab277ccaa1d4b02ef1a78ebf23028c4f9945d94d07c1919a","5a461a4790b14bf920833ebfebd5dd8a7ce3b6d35a83ee7e8abd2d8b20717004"],
    "id":"mac-mini-chgang-zincbox",
    "lengths":["53c1fdb0409ce5ddf586f70bce4777e1becd18d95e525b4c015f111d186cccb6", "8f20f882655e792f2db8075be44b729e6d24d19486e932b27c82be01e0640d6c"],
    "number_of_files":135998,
    "options":"",
    "revision":5979,
    "start_time":1516586406,
    "tag":""}

All it matters is the 3 arrays: chunks, files, and lengths. If you can find out what chunks should be in these arrays then you can complete recover the backup. But I believe that it is very unlikely that you'll have to do this.

Under what circumstances will those unreferenced chunks disappear?

You're right, duplicacy prune -exhaustive will be able to garbage-collect these unreferenced chunks.

Still, I wouldn't recommend running multiple instances of Duplicacy from the same repository. Scripts like the one supplied by @EssexBoyRacer should be used to prevent this from happening.

Christoph Jan 23 3:51PM 2018

Technically, it is possible.

Okay. And practically?

gchen Jan 23 8:01PM 2018

Practically it is not worth the effort to do this, unless you happen to have the only copy of a very important file in the revision that had been overwritten, in which case I can help you recover the file.