[GH-ISSUE #635] Quick question about OCR support #501

Closed
opened 2026-02-25 21:32:03 +03:00 by kerem · 6 comments
Owner

Originally created by @deajan on GitHub (Nov 11, 2024).
Original GitHub issue: https://github.com/ciur/papermerge/issues/635

Originally assigned to: @ciur on GitHub.

Hello,

I currently tried paperless-ngx and found it to not fit my usecases.
Mostly, I've spend some time developping support for EasyOCR for paperless-ngx, only to find out that the developpers aren't fond of supporting alternative OCR engines.

As far as I understood, papermerge uses tesseract ?
Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ?

Thanks.
Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?

Originally created by @deajan on GitHub (Nov 11, 2024). Original GitHub issue: https://github.com/ciur/papermerge/issues/635 Originally assigned to: @ciur on GitHub. Hello, I currently tried paperless-ngx and found it to not fit my usecases. Mostly, I've spend some time developping support for EasyOCR for paperless-ngx, only to find out that the developpers aren't fond of supporting alternative OCR engines. As far as I understood, papermerge uses tesseract ? Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ? Thanks. Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?
kerem 2026-02-25 21:32:03 +03:00
Author
Owner

@ciur commented on GitHub (Nov 12, 2024):

Yes, papermerge uses tesseract.

Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ?

Well, not really. There is no official "plug-in system".
But coupling with Tesseract is very thin and it is easy to add support for almost any OCR engine.

Basically, OCR part is separate application, called OCR-worker, which is connected with main app only via celery messages.

The whole dependency on OCR engine is just this module: https://github.com/papermerge/ocr-worker/blob/main/ocrworker/ocr.py (of course I don't count system dependencies, which are assumed present in dockerimage)
The entrypoint of the OCR are in tasks.py module

Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?

Well, yes and no.

Yes. Papermerge handles user/groups/permissions, but not in sense you probably mean.

Your question, I guess, is about permissions per object/resource (in this sense, specific document or folder).
No. Per object/resource/folder/document permissions are not there yet.
I will add them at the beginning of 2025.

<!-- gh-comment-id:2469955734 --> @ciur commented on GitHub (Nov 12, 2024): Yes, papermerge uses tesseract. > Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ? Well, not really. There is no official "plug-in system". But coupling with Tesseract is very thin and it is easy to add support for almost any OCR engine. Basically, OCR part is separate application, [called OCR-worker](https://github.com/papermerge/ocr-worker), which is connected with main app only via celery messages. The whole dependency on OCR engine is just this module: https://github.com/papermerge/ocr-worker/blob/main/ocrworker/ocr.py (of course I don't count system dependencies, which are assumed present in dockerimage) The entrypoint of the OCR are in [tasks.py module](https://github.com/papermerge/ocr-worker/blob/main/ocrworker/tasks.py) > Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ? Well, yes and no. Yes. Papermerge handles user/groups/permissions, but not in sense you probably mean. Your question, I guess, is about **permissions per object/resource** (in this sense, specific document or folder). No. Per object/resource/folder/document permissions are not there yet. I will add them at the beginning of 2025.
Author
Owner

@deajan commented on GitHub (Nov 12, 2024):

Thank you for your quick reply.
I've worked with OCRMyPDF to make EasyOCR work under celery and headless
I guess this work would render it compatible with papermerge.

Would you mind to shortly explain the permission system in papermerge ?
My usecase is sharing documents with my family:

  • Group parents have access to everything
  • Group children have access to only some documents
  • Individual users (childs) have access to only their documents

Is that something I can achieve with Papermerge easily ?

<!-- gh-comment-id:2470405311 --> @deajan commented on GitHub (Nov 12, 2024): Thank you for your quick reply. I've worked with OCRMyPDF to make EasyOCR work [under celery](https://github.com/ocrmypdf/OCRmyPDF-EasyOCR/pull/9) and [headless](https://github.com/ocrmypdf/OCRmyPDF-EasyOCR/pull/6) I guess this work would render it compatible with papermerge. Would you mind to shortly explain the permission system in papermerge ? My usecase is sharing documents with my family: - Group parents have access to everything - Group children have access to only some documents - Individual users (childs) have access to only their documents Is that something I can achieve with Papermerge easily ?
Author
Owner

@ciur commented on GitHub (Nov 12, 2024):

My usecase is sharing documents with my family:
....
Is that something I can achieve with Papermerge easily ?

No. Not now. Currently permissions are there to limit users to specific URLs (the technical term is "endpoints").
In other words, currently you can say: "user coco does not have permissions to access GET /groups/, POST /groups/, GET /groups/". But coco has access to "GET /nodes/, GET /documents/"....
When you define permissions there is no concept of specific document.
You can either grant user access to ALL documents - or to None, to all Groups or None, to all Folders or None.

As I mentioned above, per object permissions, this is your case when you try to grant access to specific folder or document, will come soon - beginning of 2025 (I think it will be February, 2025)

<!-- gh-comment-id:2470495365 --> @ciur commented on GitHub (Nov 12, 2024): > My usecase is sharing documents with my family: > .... > Is that something I can achieve with Papermerge easily ? No. Not now. Currently permissions are there to limit users to specific URLs (the technical term is "endpoints"). In other words, currently you can say: "user coco does not have permissions to access GET /groups/, POST /groups/, GET /groups/<any-id>". But coco has access to "GET /nodes/, GET /documents/".... When you define permissions there is no concept of specific document. You can either grant user access to ALL documents - or to None, to all Groups or None, to all Folders or None. As I mentioned above, per object permissions, this is your case when you try to grant access to specific folder or document, will come soon - beginning of 2025 (I think it will be February, 2025)
Author
Owner

@deajan commented on GitHub (Nov 12, 2024):

Thank you for the insight :)
I'll see if I can chip in a bit time to integrate EasyOCR into papermerge, since it's results are generally superior to tesseract.

<!-- gh-comment-id:2470509875 --> @deajan commented on GitHub (Nov 12, 2024): Thank you for the insight :) I'll see if I can chip in a bit time to integrate EasyOCR into papermerge, since it's results are generally superior to tesseract.
Author
Owner

@deajan commented on GitHub (Dec 4, 2025):

@ciur Did the per object permissions ever made it into papermerge ?

<!-- gh-comment-id:3613298090 --> @deajan commented on GitHub (Dec 4, 2025): @ciur Did the per object permissions ever made it into papermerge ?
Author
Owner

@ciur commented on GitHub (Dec 5, 2025):

yes, it is there in 3.5

https://docs.papermerge.io/3.5/user/sharing/

<!-- gh-comment-id:3615330460 --> @ciur commented on GitHub (Dec 5, 2025): yes, it is there in 3.5 https://docs.papermerge.io/3.5/user/sharing/
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#501
No description provided.