mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #635] Quick question about OCR support #501
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#501
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @deajan on GitHub (Nov 11, 2024).
Original GitHub issue: https://github.com/ciur/papermerge/issues/635
Originally assigned to: @ciur on GitHub.
Hello,
I currently tried paperless-ngx and found it to not fit my usecases.
Mostly, I've spend some time developping support for EasyOCR for paperless-ngx, only to find out that the developpers aren't fond of supporting alternative OCR engines.
As far as I understood, papermerge uses tesseract ?
Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ?
Thanks.
Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?
@ciur commented on GitHub (Nov 12, 2024):
Yes, papermerge uses tesseract.
Well, not really. There is no official "plug-in system".
But coupling with Tesseract is very thin and it is easy to add support for almost any OCR engine.
Basically, OCR part is separate application, called OCR-worker, which is connected with main app only via celery messages.
The whole dependency on OCR engine is just this module: https://github.com/papermerge/ocr-worker/blob/main/ocrworker/ocr.py (of course I don't count system dependencies, which are assumed present in dockerimage)
The entrypoint of the OCR are in tasks.py module
Well, yes and no.
Yes. Papermerge handles user/groups/permissions, but not in sense you probably mean.
Your question, I guess, is about permissions per object/resource (in this sense, specific document or folder).
No. Per object/resource/folder/document permissions are not there yet.
I will add them at the beginning of 2025.
@deajan commented on GitHub (Nov 12, 2024):
Thank you for your quick reply.
I've worked with OCRMyPDF to make EasyOCR work under celery and headless
I guess this work would render it compatible with papermerge.
Would you mind to shortly explain the permission system in papermerge ?
My usecase is sharing documents with my family:
Is that something I can achieve with Papermerge easily ?
@ciur commented on GitHub (Nov 12, 2024):
No. Not now. Currently permissions are there to limit users to specific URLs (the technical term is "endpoints").
In other words, currently you can say: "user coco does not have permissions to access GET /groups/, POST /groups/, GET /groups/". But coco has access to "GET /nodes/, GET /documents/"....
When you define permissions there is no concept of specific document.
You can either grant user access to ALL documents - or to None, to all Groups or None, to all Folders or None.
As I mentioned above, per object permissions, this is your case when you try to grant access to specific folder or document, will come soon - beginning of 2025 (I think it will be February, 2025)
@deajan commented on GitHub (Nov 12, 2024):
Thank you for the insight :)
I'll see if I can chip in a bit time to integrate EasyOCR into papermerge, since it's results are generally superior to tesseract.
@deajan commented on GitHub (Dec 4, 2025):
@ciur Did the per object permissions ever made it into papermerge ?
@ciur commented on GitHub (Dec 5, 2025):
yes, it is there in 3.5
https://docs.papermerge.io/3.5/user/sharing/