mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #263] Calculate hash to prevent duplicates from being imported #213
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#213
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @croontje on GitHub (Dec 20, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/263
I was testing a little bit today, and I noticed that I can upload the same PDF over and over again.
I think it would be better if you would calculate a hash (eg MD5, SHA, ...) and on import check if it already exists.
It would be even better to detect 2 versions of the same document, but I think that's almost impossible...
For example if you scan the same document twice, flag them as duplicates. But as I said this seems almost impossible to me :)
@ciur commented on GitHub (Dec 21, 2020):
@croontje, whether this feature is useful or not is a matter of debate.
There is a duplicate issue #167 on this topic.
For me personally, the inclusion of this feature as part of core will make development mode more complex: many times while fixing bugs or just during development I use to upload 2-3 available to me documents couple of times to "simulate" multitude of documents. With this hashing thingy I will need to adjust defaults in order to avoid duplicates.
Proper (I mean where similar documents are detected, which may not be necessary 100% exact) de-duplication is a good thing to have and it will be added later.
Starting with Papermerge 2.0, you will be able to create a separate app (app == extention == plugin) which will provide this functionality. Actually I will create a hashing app as example of how to write external apps in order to extend core functionality.