Prevent uploading of files that are already present | Voters

Prevent uploading of files that are already present

complete

Jendrik

It would be great to implement a check if a file is already present in a given folder structure and skip the upload of an already uploaded file. 
When does this happen? 
For example, on vacation you might copy all raw photos from the SD card to your phone and simply select all to upload and backup. As of today however we need to remember the exact last file we have successfully backed up because the system will upload all files again, irrespective of files that have been uploaded already. Especially for photography with raw files (big) this is a slow process.
I'm thinking about some hash check ;)

October 8, 2023

Jendrik

Thanks! Great work!

Tom Raganowicz

marked this post as

complete

Tom Raganowicz

We've implemented this in 1.8.1 release.
We check file last modified date and size, if these two parameters match file transfer is skipped.
It also works on Web (except Firefox where we have issues with getting correct last modified data).

Tom Raganowicz

We still haven't exactly implemented this feature. As a temporary alternative user could use Sync feature (with e.g. Copy mode) from a specific folder to chosen remote location. This feature will skip files that are already uploaded.

We will aim to increase priority of this feature given that it's gaining more and more interest.

Tom Raganowicz

marked this post as

in progress

Tom Raganowicz

marked this post as

planned

Tom Raganowicz

Hi Jendrik, thank you for adding this feature.
We're aware of this issue, however we haven't agreed on the the exact solution just yet.
The issue with hash check is that it's not compatible with the E2E model that we use. In other words the hash of encrypted media file on the S3 server is different than the hash of raw unencrypted file on mobile. In order to compare hashes we would need to download encrypted file, decrypt it locally, generate the hash and compare it with the local. That's more intensive process then just re-uploading the same file again.
Technically we could store "unencrypted" hash as a file metadata on the S3 side, so we can skip the above mentioned process. This doesn't come without drawbacks. The main issue is that by doing so we would leak some information to the S3 provider which by our model definition isn't trusted.
Quick question. If we "dedup" files by name + size initially would that help in your opinion?
Given that default upload path is: "Automatic uploads", would you like "dedup" function to run within that folder or on the bucket globally?
Some files have relatively unique naming e.g: 2023-10-06 23-33-14.jpg
 on the other hand on iOS we've noticed: IMG_1774.HEIC
. The name matcher alone might lead to some false positives, but if we combined that with file size check it might prove to be effective.
Thanks again for mentioning that, we're keen on making progress on that !

lei mix

Tom RaganowiczName + size does work to duplicate files,I just wish this filtering process could be done faster and more easily, and then by choosing to delete old or new duplicates!