mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #587] OCR saying "unsupported format" for PDF and JPG file #458
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#458
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Chavell3 on GitHub (Jan 31, 2024).
Original GitHub issue: https://github.com/ciur/papermerge/issues/587
Originally assigned to: @ciur on GitHub.
Hi Team,
after a bunch on tries I could now successfully set up Papermerge.
So by the looks of it, all connections are working between each of the instances... but when I upload a file and try to run OCR manually it fails.
Within the logs of the Worker node I see a message of "unsupported format" and it can be a PDF or JPG file which both are supported.
But by the documentation, PDF and also JPG file should work.
docker compose.txt
PM_worker_log.txt
Any idea what I could change to make it work?
Info:
Thanks!
@Chavell3 commented on GitHub (Jan 31, 2024):
also interesting if I try to manually run the OCR detection there is no logging about a new task on the worker node... I would have expected to get a new task...
BUT on the WEB service I see the following log... and something seems to be wrong there...
PM_web_log.txt
@ciur commented on GitHub (Jan 31, 2024):
@Chavell3
Does it happen for all PDF, JPG images you've tried? Or only for some of them? Would you mind attaching one problematic file (one pdf and one jpg) to this ticket so that I can troubleshoot it?
@thndrbck commented on GitHub (Jan 31, 2024):
Check to see if the file uploaded completely. Also check to see that the file is actually pdf or jpg. I had one file rejected because it had the wrong extension (three letters after the dot in the file name), and a few that didn't completely upload when I tried uploading 30 at a time.
@Chavell3 commented on GitHub (Jan 31, 2024):
I don't think it's a matter of a specific file, I now uploaded like 6 additional files(PDF's and JPEG's) non of them is scanned...
Any idea what I could do to additionally troubleshoot that?
small side note, although I entered the volumes for media and database within the compose file... those folders still stay empty... I added a picture for that.
If you still like to have some files, just give me shout... but I don't think it's file related...
Thanks for the help.
@ciur commented on GitHub (Feb 1, 2024):
Run following command in worker container:
e.g.
and tell me the result here
@Chavell3 commented on GitHub (Feb 1, 2024):
okay... already the folder "media" does not exist under /core_app
But my fault... wait let me test something...
@Chavell3 commented on GitHub (Feb 1, 2024):
I now added the docker volumes manually to mount those to my wanted folders.
But it still seems not to mount those volumes correctly.
I do found the issue... which is, that the worker and web node are not mounting the media volume correctly somehow, although it is listed when running "df"
Worker-Node:

Web-Node:

Somehow the web node has access to such volume but the worker node does not...
Interesting is, that /dev/md0 is my raid device where I want the files to be safed but I want to choose some subfolder, DMS/papermerge/..
The storage configuration within docker compose configuration looks like that:

But also there, nowhere just /dev/md0 is defined... it's always some subfolder(either "docker" or "DMS")...
@Chavell3 commented on GitHub (Feb 1, 2024):
It seems like "/dev/md0" is somehow just the naming, but it is correctly mounted to my subfolder within the directory.
Because when I browser the container's FS and compare that with the local FS where it should be located those files are correct.
That maybe means, somehow the worker node seems not to be able to mount the volume "MEDIA" because of some permission stuff... and same for the WEB node because it does not create any file within that folder...
@Chavell3 commented on GitHub (Feb 1, 2024):
I think the issue is, that the folder "media" under /core_apps does not exist. So it cannot mount that volume under that directory.
When I start the WEB node and login into the container, the folder "media" also does not exist.
BUT the difference is, it seems the WEB node does create that folder when the first file is uploaded. While the WORKER node tried to read from a directory that just does not exist, because it was never created or correctly mounted...
I created that folder "media" for the WEB and WORKER node and stopped and started them again but unfortunatly it still did not mount the volume by the looks of it, because I still can't see data that has now been created unter /core_apps/media
@Chavell3 commented on GitHub (Feb 1, 2024):
but even if I create that folder, build a new repository from that running container(with the "media" folder) and rebuild my hole papermerge environment, it does not seem to work because still on the host all created files in that volumes are not visible or does not exist on the host....
@Chavell3 commented on GitHub (Feb 1, 2024):
OKAY shame on me... all my fault...
first tried to directly mount the host folders and messed the config there.
After I fixed that, I did wrote "/core_apps" instead of "/core_app"...
After I corrected that, now everything works as expected.