[GH-ISSUE #296] Optional "Disable OCR"-Flag for Crawler #280

Closed
opened 2026-02-27 15:55:57 +03:00 by kerem · 5 comments
Owner

Originally created by @cmprmsd on GitHub (Feb 1, 2021).
Original GitHub issue: https://github.com/RD17/ambar/issues/296

Is there the possibility to tell the crawler not to do OCR?
E.g. if there are many documents where I know it's just machine readable pdfs.

Originally created by @cmprmsd on GitHub (Feb 1, 2021). Original GitHub issue: https://github.com/RD17/ambar/issues/296 Is there the possibility to tell the crawler not to do OCR? E.g. if there are many documents where I know it's just machine readable pdfs.
kerem closed this issue 2026-02-27 15:55:57 +03:00
Author
Owner

@stale[bot] commented on GitHub (Jun 11, 2021):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

<!-- gh-comment-id:859194565 --> @stale[bot] commented on GitHub (Jun 11, 2021): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Author
Owner

@cmprmsd commented on GitHub (Jun 11, 2021):

Thanks.

<!-- gh-comment-id:859301742 --> @cmprmsd commented on GitHub (Jun 11, 2021): Thanks.
Author
Owner

@iddqd-dev commented on GitHub (Jun 11, 2021):

Is there the possibility to tell the crawler not to do OCR?
E.g. if there are many documents where I know it's just machine readable pdfs.

ocrPdfMaxPageCount=0

<!-- gh-comment-id:859337040 --> @iddqd-dev commented on GitHub (Jun 11, 2021): > Is there the possibility to tell the crawler not to do OCR? > E.g. if there are many documents where I know it's just machine readable pdfs. ocrPdfMaxPageCount=0
Author
Owner

@cmprmsd commented on GitHub (Jun 11, 2021):

Nice! Where can this option be configured?

<!-- gh-comment-id:859338982 --> @cmprmsd commented on GitHub (Jun 11, 2021): Nice! Where can this option be configured?
Author
Owner

@iddqd-dev commented on GitHub (Jun 11, 2021):

Nice! Where can this option be configured?
In docker-compose.yml
environment:
- ocrPdfMaxPageCount=0

<!-- gh-comment-id:859356372 --> @iddqd-dev commented on GitHub (Jun 11, 2021): > Nice! Where can this option be configured? In docker-compose.yml environment: - ocrPdfMaxPageCount=0
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#280
No description provided.