Duplicacy Issue: Software comparison, why I'm using Duplicacy

All issues

Software comparison, why I'm using Duplicacy

Charles Apr 28 8:40AM 2017 GUI

I've seen several people ask how this software compares to others and I wanted to chime in on this for others who find this page since I have been testing for months now. Please note that this is all my experience and I have been looking for what best suits my needs and have not given thought to how these products might suit other's needs.

*skip to the bottom to hear my thoughts on Duplicacy.

I started with no backup even though I knew better. I finally kicked myself one day and went with CrashPlan free app for site to site + unlimited cloud storage. It seemed to do fine at first, but the deduplication would choke once the dataset size increased and backups of new data would takes weeks to months. I dealt with this for a long time and tried to manage it with multiple backup sets.

Duplicati looked nice, but in the first day of testing also choked after the first few hundred GB or so of data possibly due to the deduplication algorithms. I kept tweaking the settings, but I was never completely satisfied with any of the setups I was able to achieve. This might be fine for some, but it wasn't for me.

Arq worked and is even multithreaded, but I hardly noticed any deduplication (I've had similar results with just compression) which I could live with, but there were no options to manage how many versions to keep or for how long, the Windows UI was clunky, and there was no linux version. From what I can tell, it was developed for Mac, then ported to Windows. Overall it didn't seem like a good fit and I didn't feel like I had good control over the data. The pricing was pretty nice for personal use though. My experience with their support team for things that weren't working wasn't great. They had some documentation about the implementation.

Cloud Backo gave me hope for a bit since it posts documentation about how it works, however it kept failing, the logging system was a nightmare, and having to switch between the many different screens was a pain. One of the craziest things with this software was that you needed to have a backup of your backup settings in order to restore to another computer. It had no way of recognizing existing backups without having that or creating the backup set on the other computer. I particularly liked the pricing structure of Cloud Backo. I could buy the simple file backup now, and if I wanted, I could purchase other modules as I needed them. Even though there were so many options, it wasn't confusing like these things often can be when you split everything apart like that.

Cloudberry backup did everything I needed. It is fast, supports many storage options, local encryption, versioning, deletion policies, easy control over all of the settings for each backup set, and boy was it fast. There were no deduplication options unless you wanted to purchase separate software and setup a dedupe server. It does have block-level backup, but that was documented as a feature for diff comparisons, not dedupe. My major problem was that the pricing structure didn't really seem to support advanced home users. There were all of these hard coded restrictions preventing you from running certain versions of their software on certain machines, plus there was the data cap. Why in the world are they placing an additional cost based on how much data I backup when they aren't the ones storing it? A sales rep said he would give me a deal for some social media activity and I said sure. I tested the software, decided to buy it. All of a sudden he comes back with a $200 + maintenance plan to keep the software updated. I tried explaining to him that I'm a personal home user and I was lead to believe that he was going to give me a pricing somewhere between the personal edition and the server edition. This was just 100 shy of the most expensive package. All of a sudden he starts playing dumb and starts asking things like "so do you want the home edition". I suppose they may be a fine choice for a business, but the hidden fees don't make sense for personal use. They had documentation, but the organization made much of it hard to find.

Richard mentioned qBackup. I am not familiar with this but I like the promises it makes. I also like that it says that it has the same ui across all platforms and the ability to restore across platforms. I never confirmed cloudberry could restore across platforms, but it is one of my requirements. I will not test this software out though since it clearly states it does not support VSS in the FAQ. This is a deal breaker since I have had a mountain of issues come from backup services not using VSS when I am actively trying to use data.

After some initial tests, I am choosing to implement Duplicacy as my backup software. Why? It seems to excel at everything I've previously mentioned. There is plenty of design documentation which outlines what I believe to be a pretty clever implementation. It achieves deduplication (as far as I can tell) at a linear/ constant speed regardless of data size and does so across multiple backup sets and computers without having a deduplication server as an intermediary (which I have seen as a solution for a few dedup softwares now). It is moderately fast with single threaded uploads and from what I hear will support multi-threading for all storages soon. Support has been fantastic even though I haven't purchased anything yet. The pricing model is easy to understand and reasonable. The licensing is awesome and there are plans to release the source code.

Minor annoyances that may improve with time mainly come from the GUI. The GUI is nice and simple, but additional backup sets would be nice. The option to restore from additional backup repositories without switching the storage location would be nice, though now that I trust the software more after testing and researching, maybe I will combine my backup repositories into the same location. It seemed odd that there was no folder selection tool at first, but considering I normally select the root folder then add a lot of excludes for what I don't want in order to be sure that new folders will get added, it wasn't really that bad. Not for a data drive anyway. For a desktop computer with multiple root level directories, I ended up setting up a pseudo repository folder with symlinks in it.

Charles Apr 28 2:29PM 2017

Duplicacy's Clever Implementation Explained without the Lock-Free Deduplication jargon

gchen mentioned that "any backup tool that does not follow this paradigm will have some flaws here and there" and I have to agree. Duplicacy at it's core seems immune to most of the limitations and possible points of failure that most software services have to keep up with and find ways to work around. Where other backup services seem to tack on features like deduplication, versioning, and encryption after developing the backup code, Duplicacy backup algorithms natively support these things.

On top of that, it's done in a way that's simple to understand. With such a solid core design, it seems to me like this will be a highly maintainable piece of software with plenty of room for development of new features and usability enhancements, without having to worry about introducing bugs in the backup process.

I'm not bashing other backup software in this respect since many of them have found a way to make it work. This observation is simply the clear advantage of Duplicacy imo.

gchen Apr 28 9:33PM 2017

You summarized it very well. Yes, those essential backup features came about naturally in Duplicacy, and this greatly simplifies the implementation and makes the software less error-prone.

Here is the short-term development plan I copied from the other thread:

Multiple-threaded uploading and downloading (should be ready in a week or two)
A new backend for Google Cloud Storage based on the official Google client
Fair Source License
Rewrite the GUI version with a Go GUI library so it can run backup/restore without inter-process communication

Arq Support May 30 7:22PM 2017

Charles, I'm sorry to hear that we weren't super helpful. I try really hard to be, since that's basically our main marketing strategy -- making people happy. If you want to give us another try, send me email at stefan@arqbackup.com directly and I'll respond promptly.

There are some options for controlling how many backup records Arq keeps. You can set a budget: https://www.arqbackup.com/documentation/pages/budget.html and you can check the "thin backups" option which causes Arq to keep hourly backups for the past 24 hours, daily backups for the past month, and weekly backups beyond that, very similar to Time Machine's retention strategy.

Arq does multi-threaded backing up and restoring. It de-duplicates by storing data in a content-addressable fashion, similar to Duplicacy I believe. It also keeps a local database of what objects are stored at the destination, so that it's not constantly querying the destination. The Mac version faithfully restores all Mac-related metadata (extended attributes, Finder flags, Finder labels, etc).

If you have other questions please get in touch and I'll get you an answer.

gchen Jun 1 10:08PM 2017

Although I don't have any first-hand experience with Arq, I would like to chime in on the differences between Arq and Duplicacy, based on my read of Arq's design document.

Like Duplicacy, Arq naturally supports deduplication by saving chunks using the hashes as the file names (but this only applies to large files; more on this later). Unfortunately, the names of the chunk files contain the UUID of the computer as the prefix, which limits deduplication to files residing on the same computer (two computers having the same set of files will have two distinct sets of chunks stored in the storage due to this UUID prefix). Therefore, Arq does not support cross-computer deduplication and is only suited for backing up a single computer.

Another issue I can see is the handling of small files (<64KB):

A packset is a set of "packs". When Arq is backing up a folder, it combines small files into a single larger packfile; when the packfile reaches 10MB, it is stored at the destination. Also, when Arq finishes backing up a folder it stores its unsaved packfiles no matter their sizes.

I don't see how deduplication can work for smaller files when they are packed into packfiles with a hard limit of 10MB (no rolling checksum?) and hashes are stored in a separate index file (I suspect this is why Charles noticed little deduplication). On the contrary, Duplicacy treats small and large files the same way, by packing them together (into an imaginary huge tar file) and then splitting them in chunks using the variable-sized chunking algorithm. This guarantees that moving a directory full of small files to a different place (or to a different computer) will not change most of the chunks. Modifying or removing a small file may invalidate a number of existing chunks, but this number is under control because of the variable-sized chunking algorithm.

The choice of 64KB looks somewhat problematic to me -- it may not be large enough (the default chunk size in Duplicacy is 4MB). Uploading 64KB files with an average residential internet connection (mine is 1MB/s up) may still be too slow. In addition, if there are many small directories, since each directory has its own packfile and index file, you will have many small files to upload, which will significantly downgrade the performance.

Richard Jun 20 12:30PM 2017

I use Arq at home and would love to use at work, but they declined to consider adding Backblaze B2 as a storage choice. Not sure why and would love to have Arq Support explain.

Joe S Sep 17 9:27AM 2017

Arq did just announce B2 (Backblaze) support as free upgrade now just for the record. I've had similar experiences with Arq where dedup features don't seem to be optimal - for reasons described above. Certainly not the "unlocked" dedup features of Duplicacy for multi-device backup which is what's my use case interest, so I'm looking forward to a change and appreciate Charles' experience.

borgqueenx Oct 20 10:49AM 2017

I would like to add, that i tried Arq, duplicati, duplicacy with using google drive. Arq is SUPER, MEGA, INCREDIBLY -SLOW- with scanning files. Scanning takes as fast as uploading them (about 8-10mB/s) So i guess that makes it semi-ok because you still reach the max upload speed. However there is a flaw with arq, and that is that if you get the google error: "file limit rate exceeded" Arq does not retry to upload that chunk/file. Creating a whole broken backup. A good positive note about Arq is that I used it during amazon cloud drive without any errors. However going into the Arq restore folders takes alot of times. But it's the same for Duplicacy. Duplicacy simply takes 25 minutes to list all files and folders, where arq takes 10-15 seconds to load a single folder every time. Duplicati is untrustworthy to me. Getting errors i cant seem to find out what the error is exactly about and three times i woke up to see an error and the whole backup simply cancelled.

Duplicacy, using the windows symbolic linked folders system, works amazing so far. If i may add suggestions it would be to have a restore ETA and speed indication for restores as well, and the ability to remove indiviual files from snapshots.Now that i have symbolic links set up and made a google drive 12 thread limit, it all works fine basicly :)

Speed tests: 8-10mB/s to google drive using duplicacy of my allowed 18,75 mB/s ISP speed. Restore speed: 11-13mB/s from google drive using duplicacy of my allowed 18,75 mB/s ISP speed. I did not use google drive with Arq so i cannot test this out.