towerbr Feb 6 7:53PM 2018 CLI
I finished the first test with the mbox files (soon I will post the results).
From the Evernote test we have seen that the fixed 1M chunk works well for databases. But the variable chunks seems to work better for the rest.
I'm now setting up a new test to back up into two separate sets/jobs:
I already executed init
and the add
commands for the two storages.
I thought of doing an "include" with the database on the first job and an "exclude" with the same database on the second.
BUT, I realized that I have only one filters
file...
How do I make a "conditional filter"? Or how do I use two filter files?
gchen Feb 6 10:32PM 2018
You can create a new repository for the subdirectory where the SQLite database is. If the SQLite database is right under the root of the repository, then you best bet is to make a new repository on a different directory and symlink the current repository as a subdirectory.
A -filters option as required by https://github.com/gilbertchen/duplicacy/issues/314 would have been handy...
towerbr Feb 7 4:23PM 2018
The "subfolder's repository solution" worked fine.
I'm running the test #6 (with some very interesting initial results) and I will post the result in a few days.
towerbr Feb 7 6:16PM 2018
I posted test #5 results: Thunderbird (mbox + SQLite files) with 1M variable chunks (link).
It presented some interesting results but I'm seeing more interesting things in test # 6 that I'm running now. Cristoph, I think it will show some of the answers to your questions about "merged settings".
Christoph Feb 8 5:40AM 2018
<shameless advertising for a feature request>
A nice example for why we need this feature: https://github.com/gilbertchen/duplicacy/issues/337
</shameless advertising for a feature request>
Edit: no, sorry. What a nonsense! Issue 337 wont help at all here. 314, the one Gilbert mentioned will.
@towerbr: could I suggest that you label your tests on github not just with a number but with somethink like "#4 - mixed files fixed 1M chunks" or so. I'm finding it increasinly difficult to grasp what is actually being tested in each test and to navigate the tests accordingly.
It also took me a while to grasp the point of the first two charts (I guess because they combined show four data series but really it's only three as one is in both charts). Why not combine them into a single chart?
I also miss information on if an when you ran the prune
command.
your conclusion that
Deleting messages also represent an increase in storage, since a lot of new chunks seems do be generated.
is not surprising but I cannot relate it to this chart:
I assume the "duplicati" bit is a typo, so doesn't this show a decrease in storage use? The labels for the green and blue data series seem to be reversed, either in the graph or in the table at the end.
for each increase in the repository, storage increases (on average) 14 times the increment of the repository.
Yes, we are paying a high price compared with just uploading the encrypted files to, say B2 and us the built in versioning there. But I guess there is no other way.
gchen Feb 8 12:29PM 2018
It just occurred to me that a sqlite database can't be directly backed up if it is still open by a different process: https://stackoverflow.com/questions/25675314/how-to-backup-sqlite-database
If you use -vss it might work if the owner process implements the vss writer and presents a copy of the database in a consistent state...
towerbr Feb 8 12:45PM 2018
could I suggest that you label your tests on github not just with a number but with somethink like "#4 - mixed files fixed 1M chunks" or so.
The description on the readme
page is not clear? Give me an example of how you are thinking it could be the description.
I'm finding it increasinly difficult to grasp what is actually being tested in each test and to navigate the tests accordingly.
In fact I'm testing more than one thing on each test. In the 5th I tested:
is not surprising but I cannot relate it to this chart
day 2: 22.374.000
=> day 3: 24.324.000
Why not combine them into a single chart?
I thought the look would be confusing, but that might be a good idea. In test 4 it was fine.
The purpose of the first graph is to show that even with decreasing repository size (with the removal of files on the third day), the storage space does not immediately reduce without a prune, which is obvious at first, but not all people realize this.
The purpose of the second chart is to show that the actual size (reported by Rclone) is not the same as that reported by the Duplicacy log, which is associated with chunks.
But you're right: I could be clearer. I'll put these texts there.
I also miss information on if an when you ran the prune command.
No prune
so far...
I assume the "duplicati" bit is a typo
Yes, the same as the other time, I will correct it, thank you!
Christoph Feb 8 2:28PM 2018
The description on the readme page is not clear?
I meant the title, where it now just says "Test #5", which doesn't tell you anything except that it's a test and apparently the fifth one.
As for the ReadMe file: even if it is perfect, that wont help a user who, for example, clicks on your link above and ends up directly on the test #5 page.
Give me an example of how you are thinking it could be the description. Did you not like the one I gave: something like #4 - mixed files fixed 1M chunks. Or, even better: duplicacy backup test #4: #4 - mixed files, fixed 1M chunks
Anyway, you get the idea.
In fact I'm testing more than one thing on each test. In the 5th I tested:
- if 1M chunk is suitable for a mixed repository (mbox and SQLite files)
- what happens when deleting a large number of files (the account on day 3)
- what is the impact of include / exclude patterns?
I don't see it like that. These three don't stand next to each other. Rather, I would say your test was about the first one and the other two were some of the means of testing the first. So that's what I'd like to see in the title.
is not surprising but I cannot relate it to this chart day 2: 22.374.000 => day 3: 24.324.000
But after the third day it just decreases...
But you're right: I could be clearer. I'll put these texts there. Yes, that would help. Readers want guidance. Tell me what I'm supposed to see in this graph, and I'll see it.
towerbr Feb 8 5:07PM 2018
But after the third day it just decreases...
Because on the 4th day I took out the big account.
I would say your test was about the first one and the other two were some of the means of testing the first.
I see, that's indeed a better way to see the tests. In fact, to test more than one aspect at the same time it's not a good test practice (I was just trying to save time ...).
I'll repaginate the repository when I'm going to post test 6, including the file names.
Christoph Feb 9 4:14AM 2018
Because on the 4th day I took out the big account
Yes, which is why I didn't understand how it fits with your conclusion that
Deleting messages also represent an increase in storage, since a lot of new chunks seems do be generated.
Surely deleting a big account means deleting many messages, right?
BTW: If you didn't do any pruning, how can the storage decrease at all?
towerbr Feb 9 8:47AM 2018
Surely deleting a big account means deleting many messages, right?
The operations are different. When yout delete messages, they are deleted "inside" the mbox files, but the files remains there.
When you delete a account, all the mbox files are deleted.
BTW: If you didn't do any pruning, how can the storage decrease at all?
See the text that I put there yesterday, about the second chart:
"This second chart shows that the actual size (reported by Rclone) is not the same as that reported by the Duplicacy log, which is associated with chunks."
And thank you for your contributions. They are helping me to improve the descriptions.
I just put this there:
"It's worth noting that deleting the messages and deleting the account are not equivalent operations. When the messages are deleted, they are deleted "inside" the mbox files, but they remain there, only the index changes. When the account is deleted, all related mbox files are effectively deleted."
towerbr Feb 11 7:13PM 2018
I published test #6 on GitHub ... I think you'll find it interesting:link
Christoph Feb 12 4:53AM 2018
Finally! (I was really looking forward to this!) Thanks for all the testing work. Really well done!
Regarding your conclusion:
It is clear that for normal daily use it is better to have separate jobs / settings for database files (with fixed chunls) and for other files (with variable chunks).
While it is not wrong, I think it doesn't entirely reflect the results. What is missing (and what is most important for me) is that there is no difference in terms of storage use. So I would suggest presenting two conclusion points: 1. Regarding storage use and 2. regarding speed and bandwidth use (or perhaps make it three points. whatever.)
So, for me personally the conclusion is that I can continue my initial backup with 1M variable chunks. But for Gilbert, I think it is also worth considering (at least somewhere down the road) whether duplicacy could/should not be made to internally handle variable and fixed chunks within the same repository/storage, i.e. that you can specify (probably in the filters file) which files should be backed up with fixed and which ones should be backed up with variable chunks (instead of having to set up separate backups for each.
And for TheBestPessimist: how about integrating that into your scripts (i.e. to let the scripts split up what looks like one backup job into two)?
towerbr Feb 12 7:54AM 2018
So I would suggest presenting two conclusion points
Good idea, i'll change it for something separated.
towerbr Feb 12 7:56AM 2018
Gilbert, I still have a very basic doubt about the logs.
In the line referring to "all chunks", as for example:
All chunks: 7428 total, 9,162M bytes; 401 new, 586,478K bytes, 295,944K bytes uploaded
If there are 586 Mbytes of new chunks, why the upload was only 295 Mbytes?
new
refers to the local size and the difference is due to compression?
gchen Feb 12 12:29PM 2018
Yes, the original size of these 401 new chunks are 586M but after compression it is only 295M.
towerbr Feb 16 9:16PM 2018
I published the results of the two new tests:
test_07_Thunderbird_kairasku_branches
and
test_08_Evernote_kairasku_branches
There's a lot of data there. ;-)