mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #205] Automatically determine creation date via OCR #166
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#166
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @guillaume-u on GitHub (Nov 11, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/205
Originally assigned to: @ciur on GitHub.
I would like to upload a lot of scan with different date formats. Ideally, as paperless, the creation date (which is different than added date for paperless) should be determined via OCR without setting the date format.
Is it a way to do that in the current version — I did not find this capability ? or can it be a new release ?
Thanks.
@ciur commented on GitHub (Nov 11, 2020):
Hi @guillaume-u. In Papermerge there is only created_at field (creation date). One way to approach your problem would be to add "added date" metadata. If you add metadata on folder level - all documents added to that folder will inherit this metadata field. You will need to fill metadata value manually though.
@guillaume-u commented on GitHub (Nov 11, 2020):
Thanks @ciur, just one more question, the created_at field is only depending on creation time (not the content of the doc), correct ?
@ciur commented on GitHub (Nov 11, 2020):
@guillaume-u, correct,
created_atfield is set to the date your document was uploaded.To be technically correct
created_atis of so called datetime type (date + time), thus it store date and time when you document was uploaded to the Papermerge - and has nothing to do with content itself of the document.@guillaume-u commented on GitHub (Nov 11, 2020):
Thanks again @ciur.
Before closing this ticket, do you think this feature (add a date determined by OCR) is a good feature and can be added in the roadmap or not ?
@ciur commented on GitHub (Nov 11, 2020):
@guillaume-u, it is definitely a good feature. Including "added date" sounds very reasonable. The problem with that, if someone else will requests a "sent date" (many physical letters include a field which indicate when that letter was sent) or maybe a "paid date" etc etc - it would not be reasonable to include all those fields.
One solution would be to use metadata.
Yet another solution, which would be possible starting with next release is to add a very small app (I will document step by step how to do it) which will extend document model with whatever field date/integer field for your particular need/case. That would be as simple as:
Later approach has disadvantage that you need to know little bit of programming. But on the other hand, if your feature/add date field has popular demand I, or any other person can write a reusable app so that those who need add date will include it in their Papermerge instance.
@guillaume-u commented on GitHub (Nov 11, 2020):
Thanks !
As an ugly hack (for my usage only because it's not clean at all and it modifies
created_atinstead of creates a new attribute), I copy/past paperless regex intocore/automate.py. It does the job for my import.Thanks again for your project. As paperless it's a verry good one.
@amo13 commented on GitHub (Nov 13, 2020):
I also like a lot the idea of having automatic creation-date extraction. Would you mind sharing the regex from paperless here?
(duplicate of #71)