[GH-ISSUE #199] [Docker] Can't get upload OCR to work simultaneously with import dir OCR #161

Open
opened 2026-02-25 21:31:20 +03:00 by kerem · 6 comments
Owner

Originally created by @maspiter on GitHub (Oct 28, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/199

Originally assigned to: @ciur on GitHub.

Building from docker files. Cannot have both the upload and the import dir OCR to work in the same installation.

It is either one or the other it appears.

Originally created by @maspiter on GitHub (Oct 28, 2020). Original GitHub issue: https://github.com/ciur/papermerge/issues/199 Originally assigned to: @ciur on GitHub. Building from docker files. Cannot have both the upload and the import dir OCR to work in the same installation. It is either one or the other it appears.
Author
Owner

@maspiter commented on GitHub (Oct 28, 2020):

Apparently the worker does not pick up the uploaded files in the other container?

As a workaround I added the following to the app.startup.sh before starting the webserver:

nohup python manage.py worker > /dev/null 2>&1 &

This to start the worker service in the app container as well.

Of course you need to build the image with tesseract support too.

Which leaves the question: why 2 separate containers? Apart from simultaneous workloads.

<!-- gh-comment-id:718265428 --> @maspiter commented on GitHub (Oct 28, 2020): Apparently the worker does not pick up the uploaded files in the other container? As a workaround I added the following to the app.startup.sh before starting the webserver: nohup python manage.py worker > /dev/null 2>&1 & This to start the worker service in the app container as well. Of course you need to build the image with tesseract support too. Which leaves the question: why 2 separate containers? Apart from simultaneous workloads.
Author
Owner

@ciur commented on GitHub (Oct 29, 2020):

why 2 separate containers? Apart from simultaneous workloads.

Because you should be able start any number of workers. This way application will scale with number of documents/users by adding additional workers - while only one web/main app will suffice to handle http part.
I absolutely agree that for simple deployments (1 worker + 1 main app) they can happily live in same container.
This is what linuxserver actually does (they provide 1 single container for both worker and main app).

<!-- gh-comment-id:718403312 --> @ciur commented on GitHub (Oct 29, 2020): > why 2 separate containers? Apart from simultaneous workloads. Because [you should be able start any number of workers](https://papermerge.readthedocs.io/en/latest/developers_guide/design.html). This way application will scale with number of documents/users by adding additional workers - while only one web/main app will suffice to handle http part. I absolutely agree that for simple deployments (1 worker + 1 main app) they can happily live in same container. This is what [linuxserver ](https://github.com/linuxserver/docker-papermerge) actually does (they provide 1 single container for both worker and main app).
Author
Owner

@maspiter commented on GitHub (Oct 29, 2020):

Ok, thanks for the reply. But why is it then the worker is not started in the app container to handle the file uploads? This is very confusing IMHO.

So it is possible to run multiple workers in the same container? Do they work in parallel?

<!-- gh-comment-id:718767951 --> @maspiter commented on GitHub (Oct 29, 2020): Ok, thanks for the reply. But why is it then the worker is not started in the app container to handle the file uploads? This is very confusing IMHO. So it is possible to run multiple workers in the same container? Do they work in parallel?
Author
Owner

@maspiter commented on GitHub (Nov 2, 2020):

Could you please elaborate on how to make it work properly?

If the worker is not started in the app container there is no OCR for uploads.

If the worker is started there is double OCR.

I've tried using Redis and get the following output with two workers started:

[2020-11-02 23:09:51,890: INFO/MainProcess] mingle: all alone

Is the setup with two containers actually tested or am I basically debugging here? :)

<!-- gh-comment-id:720777871 --> @maspiter commented on GitHub (Nov 2, 2020): Could you please elaborate on how to make it work properly? If the worker is not started in the app container there is no OCR for uploads. If the worker is started there is double OCR. I've tried using Redis and get the following output with two workers started: [2020-11-02 23:09:51,890: INFO/MainProcess] mingle: all alone Is the setup with two containers actually tested or am I basically debugging here? :)
Author
Owner

@maspiter commented on GitHub (Nov 3, 2020):

Breaking news: built an all-in-one container and the import folder still does double OCR.

So either there is something wrong with my config or with the app. Probably config but am out of ideas.

EDIT: running 1.5.0 which might explain alot.

<!-- gh-comment-id:721251503 --> @maspiter commented on GitHub (Nov 3, 2020): Breaking news: built an all-in-one container and the import folder still does double OCR. So either there is something wrong with my config or with the app. Probably config but am out of ideas. EDIT: running 1.5.0 which might explain alot.
Author
Owner

@lucasff commented on GitHub (Dec 16, 2020):

You missed one important information here.
How are you mounting/binding your folders/volumes?

<!-- gh-comment-id:746331602 --> @lucasff commented on GitHub (Dec 16, 2020): You missed one important information here. *How are you mounting/binding your folders/volumes?*
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#161
No description provided.