mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 03:55:58 +03:00
[GH-ISSUE #624] Russian and Kazakh OCR #487
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#487
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Sergey-alm on GitHub (Aug 15, 2024).
Original GitHub issue: https://github.com/ciur/papermerge/issues/624
Originally assigned to: @ciur on GitHub.
Hello! I have installed Russian and Kazakh OCR languages, but papermerge does not work with them. The gray circle is after processing and the search does not search for Russian/Kazakh words.
Info:
@bl1nkker commented on GitHub (Apr 2, 2025):
I have implemented support for Russian and Kazakh OCR languages in my own setup, and everything is working fine. In the real world, you need to do a little more than what is described in the documentation, so here’s a step-by-step guide on how I achieved this
first, you need to create your own OCR worker image to include the necessary languages. Create a Dockerfile based on the existing papermerge/ocrworker:0.3.1 image and install the required OCR language packages:
once the docker image is built and the ocr worker is running verify that the languages are installed:
docker exec -it <ocr_worker_docker_container_id> tesseract --list-langsIn the
papermerge/core/features/tasks/schema.pyfile, add the new language codes to the LangCode typeIn the
ui2/src/cconstants/ts file, add required language names:In the
ui2/src/types.tsandui2/src/types/ocr.tsfiles, extend the OCRCode type:@bl1nkker commented on GitHub (Apr 2, 2025):
@ciur, i just wanted to point out that while the process for adding OCR languages in Papermerge is generally straightforward (which I really appreciate), it currently requires a few extra steps that aren't mentioned in the documentation
it would be great if the documentation could be updated to include these steps
@ciur commented on GitHub (Apr 2, 2025):
@bl1nkker thank you for nicely organized guide. I've added it as part of documentation