mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #394] Where is the db stored #312
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#312
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Sepphod on GitHub (Jun 30, 2021).
Original GitHub issue: https://github.com/ciur/papermerge/issues/394
I have a docker setup with a postgresql. During my test runs I added several documents. Once I deleted the db- directory and I started from scratch except I didn't delete the docker images. After restart of the containers I saw the test documents from the test run before. I was a bit surprised. Are the documents really stored in the db and this was just a temporary db still stored in the container as I didn't delete it and its image?
just wondering as I would like to rely on the storage in a db for my backups...
@l4rm4nd commented on GitHub (Jul 8, 2021):
The docker-compose lists the following entry
I therefore assume that uploaded documents resist in the specified bind mount. The database only handles the file references and does not store the files. Am I correct, @ciur?
The docker image provided by linuxserver.io does this too.
@l4rm4nd commented on GitHub (Jul 8, 2021):
@Sepphod: See the docs.
@Sepphod commented on GitHub (Jul 10, 2021):
@l4rm4nd I read the docs and I mentioned that I deleted the db files. That means I deleted the content of the mounted dir. But thx anyway it is not my biggest issue. It occurs just during my test runs.
@l4rm4nd commented on GitHub (Jul 10, 2021):
Docker is not using persistent data storage except of volumes and binds. So if you really deleted all contents and restarted your containers, it's like a fresh install without any data.
All data inside a docker container is lost during restart.
So make sure to remove any persistently saved data of both your papermerge container and the postgresql database.
@ciur commented on GitHub (Jul 10, 2021):
@Sepphod, where is the db stored it depends on your setup. In case you used default docker-compose.yml, then the db will be stored in docker volume named "db":
in such case restarting app or worker docker images will not affect database storage. In other words, with above mentioned docker compose setup, db "survives" restarting app/worker of docker images images.
This statement
is incomplete, as it is not clear if you are using postgresql docker image, if you are using db docker volume etc.
With this statement
again is not clear, do you refer to db- directory of postgres docker volume?
Notice that in database only extracted ORC text, tags, user info is stored.
Actual documents as binary files (i.e. uploaded files e.g. invoice.pdf, zdf.pdf, letter_scan.jpeg) live outside db and are stored in media root directory i.e. here:
Because media root is mounted as docker volume - that directory will not be affected by app/worker images being restarted/deleted.
Let me summarize: default docker-compose setup configures two docker volumes (actually 3, but 3rd is irrelevant to our discussion). All docker volumes survive docker restarting/deletion of app/worker/postgres docker containers.
@Sepphod commented on GitHub (Jul 11, 2021):
ok it seems I was not clear.
I have several other docker images in use which are using Postgresql or mariadb as additional images. So I know a little bit how to handle them.
I deleted the mounted directory during my tests. I wanted to have a "clean table" for the next test. Maybe I should have given more details which I got aware afterwards. The files were deleted. But I had still an empty reference in the papermerge_app. I saw still the references to those 3 files. Seems like there is a cache or another place where the references to documents are stored.
It doesn't matter anymore. I faced several more severe issues. Like the one with the time. More critical is that almost 40% of all files which I import via importer dir are not imported as the worker crashes. Than I have to delete them manually. I don't think it is the tesseract. I already OCR my documents with tesseract and imagemagick in a scripting pipeline. And I already do a short evaluation of them by an "analysis" and than I sort them manually in a filesystem. I just wanted to have a DB and maybe a Webserver to have a better overview. I didn't care about an additional OCR. But in this state I won't use it. I try to disable the OCR stuff. But I didn't find it in the documentation.
I did't find neither that I have to edit the docker file for the worker if I would like to have other tesseract-languages. An installation afterwards in the container is not possible.
I respect that there is a startup behind in order to get money from the maintenance. And I think with professional support it would work like a charm. But for me not. I will not use it anymore. I don't have the spare time to maintain it if the benefit against my current solution is small.
good luck anyway the project seems promising