SkyLinx Dec 17 10:58AM 2017 CLI
Hi! First of all I wanted to say a huge THANK YOU for this nice piece of software. I am testing it because, like many, I will be switching from Crashplan for reasons you know.
So I am currently backing up my (linux) desktop computer at home to Backblaze B2, and so far it seems to work great. One thing I was wondering though is what happens if I cron/schedule a duplicacy backup to run for example hourly or every few hours. Say one backup is taking longer than expected and the next scheduled backup is triggered before the first backup is done, what happens?
I know that duplicity handles this situation and prevents two instances from running at the same time (I've read it somewhere while researching Crashplan alternatives). Does duplicacy also handle this situation? If not, what happens if two backups with same parameters from the same repo and the same storage run at the same time?
Many, many thanks in advance for your help and for the tool :)
gchen Dec 17 4:32PM 2017
Duplicacy doesn't handle this situation. However, it won't lead to any error/corruption, maybe only some unreferenced chunks. Both backups will have the same revision number, since they are based on the same last revision. Uploading the same chunk twice won't be a problem, and the second one will be skipped because of the existence check before each upload. The only issue is in the last step of the backup operation when the snapshot file is to be uploaded. Since they have the same revision number and the snapshot file is named by the revision number, one snapshot file will be overwritten by the other. Thus, some chunks referenced by the earlier backup whose snapshot file gets overwritten will become unreferenced. However, if files do not change between these two backups, there won't by any such chunks.
SkyLinx Dec 18 12:34PM 2017
Hi gchen, thanks a lot for the explanation. Hopefully it won't happen anyway. Thanks!
EssexBoyRacer Jan 8 5:31PM 2018
If it helps here’s a script I modified to work with duplicacy that does just this.
I just call the script from Cron once per day but you can run it as often as you like and it should only ever allow one process of duplicacy to run.
You’ll need to update a couple of lines to reflect your configuration.
#!/bin/sh
PIDFILE=/root/duplicacy.pid
if [ -f $PIDFILE ]
then
PID=$(cat $PIDFILE)
ps -p $PID > /dev/null 2>&1
if [ $? -eq 0 ]
then
echo "Process already running"
exit 1
else
## Process not found assume not running
echo $$ > $PIDFILE
if [ $? -ne 0 ]
then
echo "Could not create PID file"
exit 1
fi
fi
else
echo $$ > $PIDFILE
if [ $? -ne 0 ]
then
echo "Could not create PID file"
exit 1
fi
fi
cd /mnt
/bin/duplicacy backup -hash -stats -threads 50
rm $PIDFILE
gchen Jan 9 11:31AM 2018
Thanks. Your script can be very useful. Do you ming creating a wiki page under https://github.com/gilbertchen/duplicacy/wiki to put your script there?
Christoph Jan 21 9:06AM 2018
So one way of handling multiple instances is to prevent them via a script (and it would be great if someone could share an equivalent script for windows).
But let me explore the other option: not to care about multiple instances. (I like that option, because it simplifies things quite a bit and it can also speed things up if one instance isn't able to use all your upload bandwith.) It seems to me that what gchen says can be interpreted as "it doesn't usually matter how many instances are running", right?
If that is the case, we still need to take care of what the risks are (and see if those can be avoided other than preventing multiple instances). So: what are the potential problems?
To start with, let me clarify that the scenario in the OP is just one specific case of multiple instances:
two backups with same parameters from the same repo and the same storage run at the same time
Other possible cases of multiple instances (and my tentative interpretation)
If the above is correct, we can note that multiple instances of duplicacy are only a matter of concern if they are backing up to the same storage. (BTW: what about duplicacy instances running with something else that the backup
command? I will leave that aside for the time being).
So now, what are the risks? gchen says:
some chunks referenced by the earlier backup whose snapshot file gets overwritten will become unreferenced. However, if files do not change between these two backups, there won't by any such chunks.
So there is another huge scenario, for which we can say multiple instances are no problem whatsoever: when none of the files (or file parts?) shared by the two backup jobs changes while those instances are running.
Now, what if they do? Do we lose data? Well, the data is there (the chunks have been uploaded), but they are "invisible" to any restore process, because they are not referenced by any snapshot.
That leaves us with two questions:
I cannot answer question 1, but I think the answer to question 2 is: only when duplicacy prune -exhaustive
is run. If that is correct and if there is a yes answer to question 1, then I am tempted to conclude that we can safely not care about running multiple instances of duplicacy, provided that we use duplicacy prune -exhaustive
carefully, i.e. only when we know (but how to know?) that it will not delete potentially needed orphaned chunks.
Phew! - Forgive me for thinking out aloud at length, I thought it is the best way to clarify these questions and to identify and correct mistakes I might have made in the above.
gchen Jan 21 9:51PM 2018
Can the data from the unreferenced chunks somehow be restored, assuming that the files in the repository have been destroyed for ever by a nuclear disaster?
Technically, it is possible. Here is the actually content of a snapshot file:
{
"chunks":
["dded7f716811a1d2adfc22fa783803ae000af62a83c20fea65a29b4936b1159c", "0733f59a89445c4314175575e349aec6f56b61876b3d818c418826afcd2ef971"],
"end_time":1516586444,
"file_size":22174676071,
"files":["851a248538a6630dab277ccaa1d4b02ef1a78ebf23028c4f9945d94d07c1919a","5a461a4790b14bf920833ebfebd5dd8a7ce3b6d35a83ee7e8abd2d8b20717004"],
"id":"mac-mini-chgang-zincbox",
"lengths":["53c1fdb0409ce5ddf586f70bce4777e1becd18d95e525b4c015f111d186cccb6", "8f20f882655e792f2db8075be44b729e6d24d19486e932b27c82be01e0640d6c"],
"number_of_files":135998,
"options":"",
"revision":5979,
"start_time":1516586406,
"tag":""}
All it matters is the 3 arrays: chunks
, files
, and lengths
. If you can find out what chunks should be in these arrays then you can complete recover the backup. But I believe that it is very unlikely that you'll have to do this.
Under what circumstances will those unreferenced chunks disappear?
You're right, duplicacy prune -exhaustive
will be able to garbage-collect these unreferenced chunks.
Still, I wouldn't recommend running multiple instances of Duplicacy from the same repository. Scripts like the one supplied by @EssexBoyRacer should be used to prevent this from happening.
Christoph Jan 23 3:51PM 2018
Technically, it is possible.
Okay. And practically?
gchen Jan 23 8:01PM 2018
Practically it is not worth the effort to do this, unless you happen to have the only copy of a very important file in the revision that had been overwritten, in which case I can help you recover the file.