mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 03:55:58 +03:00
[GH-ISSUE #127] French OCR fails when running after "Importer" directory #96
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#96
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gaalcaras on GitHub (Sep 17, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/127
Hi there,
I've been playing around with papermerge lately, great work!
Unfortunately, I can't seem to run French OCR when using the "Importer" directory − although it works fine when uploading the file directly.
I'm using the linuxserver Docker image of papermerge.
Here are the relevant lines from
papermerge.conf.py:When I upload a file directly to the inbox, everything works fine. Here are the first lines of the log when grepping
tesseract:But when I move a file to the Importer directory, this happens (complete log this time):
Obviously, this does not work because the correct code for French is
fra, notfre. But I can't figure out why it usesfreinstead offrajust when I use the Importer directory instead of a direct upload. I have double checked the config files, I have used the correct code.Any idea about how we could fix this?
@ciur commented on GitHub (Sep 17, 2020):
@gaalcaras
The difference between two scenarios (manually upload and importer run) is that in importer case, Papermerge gets current settings from "default OCR language of superuser" which is read from database. In case of manual upload language code is read from configuration file. This means that value "fre" comes from database. It might be that a previous typo was propagated to database and is stored there.
I assume you are the only user = admin = superuser.
With your superuser/admin try this:
By saving as I mentioned above your database stored typo (fre) should be overwritten with correct value (fra).
Just in case, you applied your configurations in both worker and main app containers, right ?
@gaalcaras commented on GitHub (Sep 17, 2020):
Indeed you're right, the problem was with the database. However, saving the ocr language in the settings menu did not work. It was already on French anyway. I changed it back to English, then back to French, to no effect. But I achieved the desired result by modifying the database directly. Thanks for your input!
AFAIK, the Linuxserver image does not run separate containers. I assumed it runs more like the "bare metal" approach described in the docs, thus reading from the same configuration files. I could be wrong though.
@ciur commented on GitHub (Sep 18, 2020):
Hi @gaalcaras,
great that you figured it out!
I have just checked linuxserver image 😮 ...
Those guys from Linuxserver did an amazing work! 🌟
First of all, indeed, they managed to wrapp everything in one single docker image!
They use different configuration (sqlite3 instead of postgresql and uwsgi instead of apache mod_wsgi).
And yes, they followed "bere metal" approach, but again, as I mentioned - they managed to wrap worker and main app in a single docker image 🎉
@gaalcaras commented on GitHub (Sep 18, 2020):
I agree, Linuxserver is an amazing project, it has made self hosting that much more enjoyable for me. I'm glad you appreciate what they did with Papermerge :)
@guim31 commented on GitHub (Oct 22, 2020):
Hi @gaalcaras !
Could you tell me how you managed to change langage in the database directly ?