[GH-ISSUE #598] Exclude document from OCR #471

Closed
opened 2026-02-25 21:31:59 +03:00 by kerem · 3 comments
Owner

Originally created by @thndrbck on GitHub (Feb 16, 2024).
Original GitHub issue: https://github.com/ciur/papermerge/issues/598

Originally assigned to: @ciur on GitHub.

Forms filled in by hand don't need Optical Character Recognition. The OCR database would fill up with form field labels. Also, disk storage will fill up with unnecessary OCR duplicates.

If you could include a check box when uploading a file so that it is marked for no OCR, that would be helpful.
A toggle to turn off OCR when batch uploading documents would also be helpful.

Originally created by @thndrbck on GitHub (Feb 16, 2024). Original GitHub issue: https://github.com/ciur/papermerge/issues/598 Originally assigned to: @ciur on GitHub. Forms filled in by hand don't need Optical Character Recognition. The OCR database would fill up with form field labels. Also, disk storage will fill up with unnecessary OCR duplicates. If you could include a check box when uploading a file so that it is marked for no OCR, that would be helpful. A toggle to turn off OCR when batch uploading documents would also be helpful.
Author
Owner

@ciur commented on GitHub (Feb 17, 2024):

Thank you for opening this ticket!

This feature makes perfect sense and it is relatively easy to implement.
Will be implemented as part of next release 3.1, which will be out in couple of weeks.

<!-- gh-comment-id:1949888057 --> @ciur commented on GitHub (Feb 17, 2024): Thank you for opening this ticket! This feature makes perfect sense and it is relatively easy to implement. Will be implemented as part of next release 3.1, which will be out in couple of weeks.
Author
Owner

@thndrbck commented on GitHub (Feb 19, 2024):

Re: Did you meant here exclude entire document from being OCRed - which is exactly as https://github.com/ciur/papermerge/issues/598 ?

Or did you really meant to exclude specific pages from being OCRed ?
In last case, i.e. when you mean to exclude specific pages from OCRed - it is not possible to implement. It is either entire document (i.e. all pages in the document) or nothing.


I meant not OCRing the entire document.

<!-- gh-comment-id:1953070824 --> @thndrbck commented on GitHub (Feb 19, 2024): Re: Did you meant here exclude entire document from being OCRed - which is exactly as https://github.com/ciur/papermerge/issues/598 ? Or did you really meant to exclude specific pages from being OCRed ? In last case, i.e. when you mean to exclude specific pages from OCRed - it is not possible to implement. It is either entire document (i.e. all pages in the document) or nothing. ***** I meant not OCRing the entire document.
Author
Owner

@ciur commented on GitHub (Feb 23, 2024):

Added
PR#332

Feature will be part of the 3.1.0 release.

<!-- gh-comment-id:1960738642 --> @ciur commented on GitHub (Feb 23, 2024): Added [PR#332](https://github.com/papermerge/papermerge-core/pull/332) Feature will be part of the 3.1.0 release.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#471
No description provided.