[GH-ISSUE #394] Where is the db stored #312

Closed
opened 2026-02-25 21:31:40 +03:00 by kerem · 6 comments
Owner

Originally created by @Sepphod on GitHub (Jun 30, 2021).
Original GitHub issue: https://github.com/ciur/papermerge/issues/394

I have a docker setup with a postgresql. During my test runs I added several documents. Once I deleted the db- directory and I started from scratch except I didn't delete the docker images. After restart of the containers I saw the test documents from the test run before. I was a bit surprised. Are the documents really stored in the db and this was just a temporary db still stored in the container as I didn't delete it and its image?
just wondering as I would like to rely on the storage in a db for my backups...

Originally created by @Sepphod on GitHub (Jun 30, 2021). Original GitHub issue: https://github.com/ciur/papermerge/issues/394 I have a docker setup with a postgresql. During my test runs I added several documents. Once I deleted the db- directory and I started from scratch except I didn't delete the docker images. After restart of the containers I saw the test documents from the test run before. I was a bit surprised. Are the documents really stored in the db and this was just a temporary db still stored in the container as I didn't delete it and its image? just wondering as I would like to rely on the storage in a db for my backups...
kerem closed this issue 2026-02-25 21:31:40 +03:00
Author
Owner

@l4rm4nd commented on GitHub (Jul 8, 2021):

The docker-compose lists the following entry

volumes:
      - media_root:/opt/media

I therefore assume that uploaded documents resist in the specified bind mount. The database only handles the file references and does not store the files. Am I correct, @ciur?

The docker image provided by linuxserver.io does this too.

<!-- gh-comment-id:876344150 --> @l4rm4nd commented on GitHub (Jul 8, 2021): The docker-compose lists the following entry ```` volumes: - media_root:/opt/media ```` I therefore assume that uploaded documents resist in the specified bind mount. The database only handles the file references and does not store the files. Am I correct, @ciur? The docker image provided by linuxserver.io does this too.
Author
Owner

@l4rm4nd commented on GitHub (Jul 8, 2021):

The docker-compose lists the following entry

volumes:
      - media_root:/opt/media

I therefore assume that uploaded documents resist in the specified bind mount. The database only handles the file references and does not store the files. Am I correct, @ciur?

The docker image provided by linuxserver.io does this too.

@Sepphod: See the docs.

<!-- gh-comment-id:876345851 --> @l4rm4nd commented on GitHub (Jul 8, 2021): > The docker-compose lists the following entry > > ``` > volumes: > - media_root:/opt/media > ``` > > I therefore assume that uploaded documents resist in the specified bind mount. The database only handles the file references and does not store the files. Am I correct, @ciur? > > The docker image provided by linuxserver.io does this too. @Sepphod: See the [docs](https://papermerge.readthedocs.io/en/latest/developers_guide/storage_structure.html).
Author
Owner

@Sepphod commented on GitHub (Jul 10, 2021):

@l4rm4nd I read the docs and I mentioned that I deleted the db files. That means I deleted the content of the mounted dir. But thx anyway it is not my biggest issue. It occurs just during my test runs.

<!-- gh-comment-id:877567521 --> @Sepphod commented on GitHub (Jul 10, 2021): @l4rm4nd I read the docs and I mentioned that I deleted the db files. That means I deleted the content of the mounted dir. But thx anyway it is not my biggest issue. It occurs just during my test runs.
Author
Owner

@l4rm4nd commented on GitHub (Jul 10, 2021):

@l4rm4nd I read the docs and I mentioned that I deleted the db files. That means I deleted the content of the mounted dir. But thx anyway it is not my biggest issue. It occurs just during my test runs.

Docker is not using persistent data storage except of volumes and binds. So if you really deleted all contents and restarted your containers, it's like a fresh install without any data.

All data inside a docker container is lost during restart.

So make sure to remove any persistently saved data of both your papermerge container and the postgresql database.

<!-- gh-comment-id:877602616 --> @l4rm4nd commented on GitHub (Jul 10, 2021): > @l4rm4nd I read the docs and I mentioned that I deleted the db files. That means I deleted the content of the mounted dir. But thx anyway it is not my biggest issue. It occurs just during my test runs. Docker is not using persistent data storage except of volumes and binds. So if you really deleted all contents and restarted your containers, it's like a fresh install without any data. All data inside a docker container is lost during restart. So make sure to remove any persistently saved data of both your papermerge container and the postgresql database.
Author
Owner

@ciur commented on GitHub (Jul 10, 2021):

@Sepphod, where is the db stored it depends on your setup. In case you used default docker-compose.yml, then the db will be stored in docker volume named "db":

  db:
    image: postgres:12.3
    container_name: postgres_db
    volumes:
      - postgres_data:/var/lib/postgresql/data/

in such case restarting app or worker docker images will not affect database storage. In other words, with above mentioned docker compose setup, db "survives" restarting app/worker of docker images images.

This statement

I have a docker setup with a postgresql [...]

is incomplete, as it is not clear if you are using postgresql docker image, if you are using db docker volume etc.

With this statement

Once I deleted the db- directory and I started from scratch except I didn't delete the docker images. After restart of the containers I saw the test documents from the test run before.

again is not clear, do you refer to db- directory of postgres docker volume?

Notice that in database only extracted ORC text, tags, user info is stored.
Actual documents as binary files (i.e. uploaded files e.g. invoice.pdf, zdf.pdf, letter_scan.jpeg) live outside db and are stored in media root directory i.e. here:

volumes:
      - media_root:/opt/media

Because media root is mounted as docker volume - that directory will not be affected by app/worker images being restarted/deleted.

Let me summarize: default docker-compose setup configures two docker volumes (actually 3, but 3rd is irrelevant to our discussion). All docker volumes survive docker restarting/deletion of app/worker/postgres docker containers.

<!-- gh-comment-id:877605445 --> @ciur commented on GitHub (Jul 10, 2021): @Sepphod, where is the db stored it depends on your setup. In case you used default [docker-compose.yml](https://github.com/ciur/papermerge/blob/master/docker/docker-compose.yml), then the db will be stored in docker volume named "db": ``` db: image: postgres:12.3 container_name: postgres_db volumes: - postgres_data:/var/lib/postgresql/data/ ``` in such case restarting app or worker docker images will not affect database storage. In other words, with above mentioned docker compose setup, db "survives" restarting app/worker of docker images images. This statement > I have a docker setup with a postgresql [...] is incomplete, as it is not clear if you are using postgresql docker image, if you are using db docker volume etc. With this statement > Once I deleted the db- directory and I started from scratch except I didn't delete the docker images. After restart of the containers I saw the test documents from the test run before. again is not clear, do you refer to db- directory of postgres docker volume? Notice that in database only extracted ORC text, tags, user info is stored. Actual documents as binary files (i.e. uploaded files e.g. invoice.pdf, zdf.pdf, letter_scan.jpeg) live outside db and are stored in media root directory i.e. here: ``` volumes: - media_root:/opt/media ``` Because media root is mounted as docker volume - that directory will not be affected by app/worker images being restarted/deleted. Let me summarize: default docker-compose setup configures two docker volumes (actually 3, but 3rd is irrelevant to our discussion). All docker volumes survive docker restarting/deletion of app/worker/postgres docker containers.
Author
Owner

@Sepphod commented on GitHub (Jul 11, 2021):

ok it seems I was not clear.
I have several other docker images in use which are using Postgresql or mariadb as additional images. So I know a little bit how to handle them.
I deleted the mounted directory during my tests. I wanted to have a "clean table" for the next test. Maybe I should have given more details which I got aware afterwards. The files were deleted. But I had still an empty reference in the papermerge_app. I saw still the references to those 3 files. Seems like there is a cache or another place where the references to documents are stored.
It doesn't matter anymore. I faced several more severe issues. Like the one with the time. More critical is that almost 40% of all files which I import via importer dir are not imported as the worker crashes. Than I have to delete them manually. I don't think it is the tesseract. I already OCR my documents with tesseract and imagemagick in a scripting pipeline. And I already do a short evaluation of them by an "analysis" and than I sort them manually in a filesystem. I just wanted to have a DB and maybe a Webserver to have a better overview. I didn't care about an additional OCR. But in this state I won't use it. I try to disable the OCR stuff. But I didn't find it in the documentation.
I did't find neither that I have to edit the docker file for the worker if I would like to have other tesseract-languages. An installation afterwards in the container is not possible.
I respect that there is a startup behind in order to get money from the maintenance. And I think with professional support it would work like a charm. But for me not. I will not use it anymore. I don't have the spare time to maintain it if the benefit against my current solution is small.
good luck anyway the project seems promising

<!-- gh-comment-id:877746829 --> @Sepphod commented on GitHub (Jul 11, 2021): ok it seems I was not clear. I have several other docker images in use which are using Postgresql or mariadb as additional images. So I know a little bit how to handle them. I deleted the mounted directory during my tests. I wanted to have a "clean table" for the next test. Maybe I should have given more details which I got aware afterwards. The files were deleted. But I had still an empty reference in the papermerge_app. I saw still the references to those 3 files. Seems like there is a cache or another place where the references to documents are stored. It doesn't matter anymore. I faced several more severe issues. Like the one with the time. More critical is that almost 40% of all files which I import via importer dir are not imported as the worker crashes. Than I have to delete them manually. I don't think it is the tesseract. I already OCR my documents with tesseract and imagemagick in a scripting pipeline. And I already do a short evaluation of them by an "analysis" and than I sort them manually in a filesystem. I just wanted to have a DB and maybe a Webserver to have a better overview. I didn't care about an additional OCR. But in this state I won't use it. I try to disable the OCR stuff. But I didn't find it in the documentation. I did't find neither that I have to edit the docker file for the worker if I would like to have other tesseract-languages. An installation afterwards in the container is not possible. I respect that there is a startup behind in order to get money from the maintenance. And I think with professional support it would work like a charm. But for me not. I will not use it anymore. I don't have the spare time to maintain it if the benefit against my current solution is small. good luck anyway the project seems promising
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#312
No description provided.