Duplicacy Issue: Memory Usage

All issues

Memory Usage

Carl Nasal Jul 31 5:01AM 2017 CLI

I am testing Duplicacy and Backblaze for Linux server backup to replace IDrive for Linux. It's working well on smaller servers, but I have a server with 185GB of storage used, and it has 12GB of memory. When I run a backup, it often gets killed by OOM because it's using so much memory. For example, I ran it last night and it peaked at 3.5GB of memory usage before it was killed. Is there anything I can do to make reduce memory usage?

FYI, the storage is encrypted, and I'm running the backup with "duplicacy backup -stats"

Thanks, Carl

Alex JOST Jul 31 8:53AM 2017

I'm too having issues when trying to backup about 1 TB of data. Memory usage climbs up to 11-12 GB plus 8 GB of swap.

gchen Jul 31 7:56PM 2017

The memory usage is highly related to the number of files to be backed up. Duplicacy loads the entire file list into the memory during the indexing phase so you may run out of memory if there are too many files. But after the indexing phase the memory usage should stay flat.

Another factor that may dramatically increase the memory usage is extended attributes. When building the file list the extended attributes are also read into memory at the beginning. However, after the number of files exceeds a certain number (controlled by the environment variable DUPLICACY_ATTRIBUTE_THRESHOLD, which defaults to 1 million), Duplicacy will stop loading extended attributes during the indexing phase but instead will only read and upload them when preparing the final snapshot file.

So maybe setting DUPLICACY_ATTRIBUTE_THRESHOLD to a really small number, like 1, will help.

Carl Nasal Aug 2 8:36AM 2017

I tried setting DUPLICACY_ATTRIBUTE_THRESHOLD to 1, and it did allow the process to finish, but it only used slightly less memory (the peak was around 3.2GB). Do you have plans to try to find ways to reduce memory usage when there's a large amount of files to backup?

Thanks, Carl

gchen Aug 2 8:32PM 2017

Definitely. There is no need to load the entire file list into memory at once. My plan is to construct the file list on the fly and upload file list chunks as soon as they have been generated. This will add significant complexity to the main backup loop, but in the long run should be worth it.

Carl Nasal Aug 3 3:14PM 2017

That's great to hear. Thanks for that information. I look forward to that update.

Carl

whereisaaron Aug 3 10:04PM 2017

After the file list phase, do any of the block size setting or other preferences control memory use?

Testing with default settings I was seeing about 100MB RAM allocated for the file list phases, and then during backup, pretty flat at 450MB RAM. That was more RAM than I want hoping to use on some clients.

gchen Aug 4 12:16PM 2017

The default average chunk size is 4MB, but it is the maximum chunk size (16 MB) that determines the size of buffers to be allocated, and there could be multiple buffers. If you set the average chunk size to 1MB when initializing the storage, the default maximum size will be 4MB, and that could reduce the memory footprint a bit.

Harnser Oct 21 4:29AM 2017

What's the status of this issue? I have a 1.7TB backup via another provider's software and want to switch to Duplicacy, but memory usage is definitely going to be important.

gchen Oct 21 10:02PM 2017

I haven't got a chance to work on this. However, a 1.7 TB backup may not consume too much memory, if the number of files isn't huge. I know several customers who back up millions of files totaling more than 10 TB.