mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #471] Can't start a container using a named volume #311
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#311
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zblesk on GitHub (Sep 9, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/471
Describe the bug
Can't get ArchiveBox to run with data in a named volume. If I just map a standard folder, it works though.
Steps to reproduce
I'm trying to use docker-compose. My file:
First I run
docker-compose up --no-startto create the container and volume without starting anything.Then, running
docker-compose run archivebox initkeeps failing on permission errors. I tried creating the folders manually within the volume, and setting everything in the volume to mode 777, but nothing helped.Screenshots or log output
@pirate commented on GitHub (Sep 10, 2020):
After it fails with that error (don't run anything else), what are the permissions on the
./data/ArchiveBox.conffile? Still 777?Any particular reason why you're running it on a named volume and not a bind mount? That seems somewhat dangerous.
@zblesk commented on GitHub (Sep 10, 2020):
Yes, still 777.
Why does it seem dangerous?
On a related note: since I've been eager to try it out, I've enqueued the download of 1000 links. The VM froze three times since then and had to be restarted (not yet sure why); I've noticed no attempt to download the enqueued webs was made. Can this be turned on? Or what is the best way to do it? (I now see about ~100 items are downloaded before the server dies. Weird, never had that happen before, no idea what causes it.)
@cdvv7788 commented on GitHub (Sep 10, 2020):
It can crash on low memory systems because there are some bottlenecks with big indexes. The upcoming
0.5release should help with this. You can runarchivebox updateand it will retry failed links and extractors.@zblesk commented on GitHub (Sep 10, 2020):
Thank you. How much memory is enough? The VM has 8GB. I'll try running the next 100 webs and see if there's a spike.
@pirate commented on GitHub (Sep 10, 2020):
It sounds like you are not memory limited, normally 2GB or even 1GB is enough. Do you see any suspicious log messages around the time it crashes?
The thing about Docker volumes is maybe just a personal paranoia, but I don't like trusting docker's internal filesystem for storing important data long-term. I've lost volumes in the past when moving between machines because they weren't attached to any local folder that I could quickly copy over with the compose file. I also like being able to restart a docker setup from scratch without losing application state by doing
docker system prune --all(which deletes all ephemeral volumes but not bound folder contents).@zblesk commented on GitHub (Sep 14, 2020):
Don't know, the problem stopped appearing. Perhaps it was unrelated. 🤷🏻♀️
I run
archivebox updatein the docker container, but it's taking a very long time, some 2 minutes per every web page. Can I safely runarchivebox updatemultiple times in parallel, without risking some data corruption/DB inconsistency?(I.e., the links already are in the DB, in 'pending' state - they just haven't been processed yet. Since I still have ~2 000 links waiting, and ~14 000 more to go, at this rate it'd take weeks...)
@zblesk commented on GitHub (Sep 15, 2020):
Ok, tried it, didn't work. (Crashes because database locked.)
Is there anything I can do to speed it up?
@tonylaw7 commented on GitHub (Dec 7, 2020):
I'm experiencing a similar issue when running the docker container with update command. I'm running ArchiveBox on a virtual machine with 8GB of RAM, and had no issues with previous versions when using update.
Here's my output:
Enviroment:
@pirate commented on GitHub (Dec 7, 2020):
@tonylaw7 what version are you running? Can you post the output of
archivebox versionandarchivebox status.@tonylaw7 commented on GitHub (Dec 7, 2020):
ArchiveBox v0.4.24
@pirate commented on GitHub (Dec 9, 2020):
@tonylaw7
@zblesk v0.5.0 has many speed improvements that should make multi-process archiving better, but it's not finished yet, give us a week or so for the final testing.
@dohlin commented on GitHub (Jan 15, 2021):
I too am getting this "sqlite3.OperationalError: database is locked" error seemingly randomly on v0.5, Ubuntu 20.04 too...if I reboot and run archivebox update again it doesn't fail where it did previously, but it will eventually fail again. For kicks I put 32GB of RAM on this VM and it's still seeing this error. Anything else I can try?
@pirate commented on GitHub (Feb 1, 2021):
This original issue should be fixed now in the latest v0.5.4.
The other error
sqlite3.OperationalError: database is lockedis due to archivebox being slow :(, unfortunately its an architectural issue we're still working on, stay tuned for v0.6. For now don't throw extra RAM/CPU at it, rather try avoid archiving more than 1 link at once, or using the UI heavily while it's in the middle of archiving.@dohlin commented on GitHub (Apr 2, 2021):
@pirate Is there anything additional that can be done to minimize instances of the "sqlite3.OperationalError: database is locked" error? Long ago I had a very early build of ArchiveBox running, but I neglected to ever upgrade it. And due to the massive number of changes to ArchiveBox between then and now I never was able to get it to successfully "upgrade". So, I've been trying to "start fresh" with an up-to-date build.
Unfortunately, as of v0.5.6 I'm still seeing this error pretty consistently while trying to complete my "initial" archive of bookmarked links. I have quite a few (probably way too many...>2k) and while I could probably clean some out I keep many of them for reference purposes.
Is there any other timeout setting or anything I can try to increase or adjust to lessen this error at all? Anything to make it so that I don't have to re-run
archivebox updateagain and again and again every hour/few hours? I don't have anything else running on this VM and the only "UI usage" I've done on it is occasional checks to see if it's still running or not. Gonna to take a long time to get through this many links if not lol :)And as always - thank you for your hard work on this!! I and many others really appreciate it!
@mAAdhaTTah commented on GitHub (Apr 2, 2021):
@dohlin If you import the links one at a time, via the CLI, and keep the web server off during that time, you should only have the CLI process locking/using the db which should minimize/eliminate the problem.
Alternatively, if you still have the results from the early build, i would update incrementally. Meaning, instead of going straight to the current version, you install each incremental version, upgrade the content, then install the next version. This might be safer than tryna go all the way at once.
@dohlin commented on GitHub (Apr 2, 2021):
@mAAdhaTTah Ok good to know. The old build I was on was ollllddd and honestly I don't even know if I have backups still of that VM since I've rebuilt it since. At this point I'm better off just rolling with this from scratch for now...I've got 10 of 57 pages done so far LOL. Thanks!
@pirate commented on GitHub (Apr 2, 2021):
@dohlin I would also recommend the incremental upgrade, although you don't have to do it though every intermediate version. v0.4.x was specifically designed to handle importing really old archives, so if you go from the old version to v0.4.24, then from there to v0.5.6, it should work in only 2 steps. 2k links is well within the realm of what it can handle, it should only start getting sketchy above ~25k links (and v0.6 coming soon is tested to be stable up to 150k). v0.6 also has many fixes that improve performance overall, though it's not totally solved the db locking issue, it should be much better when that comes out.
Also as @mAAdhaTTah mentioned, make sure you have the webserver stopped and only use 1 CLI process to do the upgrade/import, there should be no concurrency / locking issues with only 1 process.