[GH-ISSUE #63] OCR_BINARY seems to be useless #47

Closed
opened 2026-02-25 21:31:05 +03:00 by kerem · 6 comments
Owner

Originally created by @mtonnie on GitHub (Aug 11, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/63

It looks like the setting OCR_BINARY isn't taken into account.

The path for tesseract is hardcoded in mglib, as all other binaries.

I would really apriciate to have the ability to define binaries or paths with configuration file.

Originally created by @mtonnie on GitHub (Aug 11, 2020). Original GitHub issue: https://github.com/ciur/papermerge/issues/63 It looks like the setting OCR_BINARY isn't taken into account. The path for tesseract is hardcoded in mglib, as all other binaries. I would really apriciate to have the ability to define binaries or paths with configuration file.
Author
Owner

@ciur commented on GitHub (Aug 11, 2020):

You are right, OCR_BINARY setting is ignored.
I will fix this problem.

<!-- gh-comment-id:671943377 --> @ciur commented on GitHub (Aug 11, 2020): You are right, OCR_BINARY setting is ignored. I will fix this problem.
Author
Owner

@ciur commented on GitHub (Aug 11, 2020):

Hi @mtonnie

I pushed couple of commits:

Also notice that instead of OCR_BINARY was renamed to BINARY_OCR. This is to be consistent with rest of BINARY_ settings:

  • BINARY_FILE default value is "/usr/bin/file"
  • BINARY_CONVERT default value is "/usr/bin/convert"
  • BINARY_PDFTOPPM default value is "/usr/bin/pdftoppm"
  • BINARY_PDFINFO default value is "/usr/bin/pdfinfo"
  • BINARY_IDENTIFY default value is "/usr/bin/identify"
  • BINARY_OCR default value is "/usr/bin/tesseract"
  • BINARY_PDFTK default value is "/usr/bin/pdftk"

All above settings can be added to papermerge.conf.py to modify path of respective executable.

Also notice that mglib version was incremented.
This code is very fresh (I just pushed). According to my couple of tests it works pretty well. Anyway, tomorrow I will update documention and perform couple of more tests.

To be continued...

<!-- gh-comment-id:672145404 --> @ciur commented on GitHub (Aug 11, 2020): Hi @mtonnie I pushed couple of commits: - [changes in mglib](https://github.com/papermerge/mglib/commit/5edd196aaa1fe710ac5742048b1e252a38e3587c) - [changes in papermerge](https://github.com/ciur/papermerge/commit/f4a062ec98bb1742f49a98f2d0bcec6ca2a572ec) Also notice that instead of OCR_BINARY was renamed to BINARY_OCR. This is to be consistent with rest of BINARY_ settings: * BINARY_FILE default value is "/usr/bin/file" * BINARY_CONVERT default value is "/usr/bin/convert" * BINARY_PDFTOPPM default value is "/usr/bin/pdftoppm" * BINARY_PDFINFO default value is "/usr/bin/pdfinfo" * BINARY_IDENTIFY default value is "/usr/bin/identify" * BINARY_OCR default value is "/usr/bin/tesseract" * BINARY_PDFTK default value is "/usr/bin/pdftk" All above settings can be added to papermerge.conf.py to modify path of respective executable. Also notice that mglib version was incremented. This code is very fresh (I just pushed). According to my couple of tests it works pretty well. Anyway, **tomorrow I will update documention** and perform couple of more tests. To be continued...
Author
Owner

@mtonnie commented on GitHub (Aug 12, 2020):

Hi @ciur,

I guess the checks in core/checks.py should also take into account the variables.
If not these checks may fail when the location of the binary isn't inculded in PATH enironment variable.

What do you think?

<!-- gh-comment-id:672566441 --> @mtonnie commented on GitHub (Aug 12, 2020): Hi @ciur, I guess the checks in core/checks.py should also take into account the variables. If not these checks may fail when the location of the binary isn't inculded in PATH enironment variable. What do you think?
Author
Owner

@ciur commented on GitHub (Aug 13, 2020):

Hi @mtonnie ,

correct! I absolutely agree. Here is the fix.

Documentation update.

@mtonnie thank you for your great feedback!
If the case above changes solved your problem with hardcoded binary paths - please close this ticket/issue.

Thank you again!

<!-- gh-comment-id:673445729 --> @ciur commented on GitHub (Aug 13, 2020): Hi @mtonnie , correct! I absolutely agree. Here [is the fix.](https://github.com/ciur/papermerge/commit/c8b9f7ed27430fdd9e0e7278fbf1c9f1b176e9bd) Documentation [update](https://github.com/ciur/papermerge/commit/f4a900d06c07adae57795d6d55cbc8954a54d457). @mtonnie thank you for your great feedback! If the case above changes solved your problem with hardcoded binary paths - please close this ticket/issue. Thank you again!
Author
Owner

@mtonnie commented on GitHub (Aug 14, 2020):

Thanks a lot, looks good so far.
I'll test it with synlology package soon, I have focus on installation wizard the last few days.

<!-- gh-comment-id:674243434 --> @mtonnie commented on GitHub (Aug 14, 2020): Thanks a lot, looks good so far. I'll test it with synlology package soon, I have focus on installation wizard the last few days.
Author
Owner

@ciur commented on GitHub (Aug 15, 2020):

@mtonnie, yes I saw your packaging progress.I pinned that issue - as I consider it very important one. Awesome work! Thank you!

<!-- gh-comment-id:674355332 --> @ciur commented on GitHub (Aug 15, 2020): @mtonnie, yes I saw your packaging progress.I pinned that issue - as I consider it very important one. Awesome work! Thank you!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#47
No description provided.