JarnoP Sep 13 12:59AM 2017
Hi,
To my understanding Duplicacy relies solely on size of the chunk and SHA-256 hash for determining whether two chunks are the same or not? I know that I am splitting hairs here, but in theory it is possible that two chunks have the same size and hash, right. I am thinking here HUGE backups meant to stored for tens of years. Eventually there might be a hash collision.
Does Duplicacy algorithm somehow detect collision and resort to byte-by-byte comparison in case the size and hash match? If not, are there plans to take a longer hash into use? The computational load might not be any higher.
Jarno
gchen Sep 13 10:01AM 2017
Hash collisions will be detected during restore
or check -files
. This is because we keep file hashes in the snapshot so an incorrect chunk will very likely cause a different file hash. There is no way to detect hash collisions during the backup command when uploading chunks -- there doesn't seem to be an efficient way to do that other than using a longer hash.
Currently there is no plan to use a longer hash, although it is not hard to implement.