[GH-ISSUE #471] Can't start a container using a named volume #311

Closed
opened 2026-03-01 14:42:20 +03:00 by kerem · 17 comments
Owner

Originally created by @zblesk on GitHub (Sep 9, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/471

Describe the bug

Can't get ArchiveBox to run with data in a named volume. If I just map a standard folder, it works though.

Steps to reproduce

I'm trying to use docker-compose. My file:

version: "3"

services:
  archivebox:
    image: nikisweeting/archivebox
    volumes:
      - archivebox_files:/data
    ports:
      - 8000:8000 

volumes:
  archivebox_files:

First I run docker-compose up --no-start to create the container and volume without starting anything.

Then, running docker-compose run archivebox init keeps failing on permission errors. I tried creating the folders manually within the volume, and setting everything in the volume to mode 777, but nothing helped.

Screenshots or log output

image



#### Software versions

 - OS: Ubuntu Server 18 LTS
 - ArchiveBox version:        latest docker image
Originally created by @zblesk on GitHub (Sep 9, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/471 #### Describe the bug Can't get ArchiveBox to run with data in a named volume. If I just map a standard folder, it works though. #### Steps to reproduce I'm trying to use docker-compose. My file: ``` version: "3" services: archivebox: image: nikisweeting/archivebox volumes: - archivebox_files:/data ports: - 8000:8000 volumes: archivebox_files: ``` First I run `docker-compose up --no-start` to create the container and volume without starting anything. Then, running `docker-compose run archivebox init` keeps failing on permission errors. I tried creating the folders manually within the volume, and setting everything in the volume to mode 777, but nothing helped. #### Screenshots or log output ![image](https://user-images.githubusercontent.com/4245227/92655341-86c02180-f2f1-11ea-8ebb-2a66a0d4276f.png) ``` #### Software versions - OS: Ubuntu Server 18 LTS - ArchiveBox version: latest docker image
Author
Owner

@pirate commented on GitHub (Sep 10, 2020):

After it fails with that error (don't run anything else), what are the permissions on the ./data/ArchiveBox.conf file? Still 777?

Any particular reason why you're running it on a named volume and not a bind mount? That seems somewhat dangerous.

<!-- gh-comment-id:690597331 --> @pirate commented on GitHub (Sep 10, 2020): After it fails with that error (don't run anything else), what are the permissions on the `./data/ArchiveBox.conf` file? Still 777? Any particular reason why you're running it on a named volume and not a bind mount? That seems somewhat dangerous.
Author
Owner

@zblesk commented on GitHub (Sep 10, 2020):

Yes, still 777.

Why does it seem dangerous?

On a related note: since I've been eager to try it out, I've enqueued the download of 1000 links. The VM froze three times since then and had to be restarted (not yet sure why); I've noticed no attempt to download the enqueued webs was made. Can this be turned on? Or what is the best way to do it? (I now see about ~100 items are downloaded before the server dies. Weird, never had that happen before, no idea what causes it.)

<!-- gh-comment-id:690616898 --> @zblesk commented on GitHub (Sep 10, 2020): Yes, still 777. Why does it seem dangerous? On a related note: since I've been eager to try it out, I've enqueued the download of 1000 links. The VM froze three times since then and had to be restarted (not yet sure why); I've noticed no attempt to download the enqueued webs was made. Can this be turned on? Or what is the best way to do it? (I now see about ~100 items are downloaded before the server dies. Weird, never had that happen before, no idea what causes it.)
Author
Owner

@cdvv7788 commented on GitHub (Sep 10, 2020):

It can crash on low memory systems because there are some bottlenecks with big indexes. The upcoming 0.5 release should help with this. You can run archivebox update and it will retry failed links and extractors.

<!-- gh-comment-id:690622431 --> @cdvv7788 commented on GitHub (Sep 10, 2020): It can crash on low memory systems because there are some bottlenecks with big indexes. The upcoming `0.5` release should help with this. You can run `archivebox update` and it will retry failed links and extractors.
Author
Owner

@zblesk commented on GitHub (Sep 10, 2020):

Thank you. How much memory is enough? The VM has 8GB. I'll try running the next 100 webs and see if there's a spike.

<!-- gh-comment-id:690663934 --> @zblesk commented on GitHub (Sep 10, 2020): Thank you. How much memory is enough? The VM has 8GB. I'll try running the next 100 webs and see if there's a spike.
Author
Owner

@pirate commented on GitHub (Sep 10, 2020):

It sounds like you are not memory limited, normally 2GB or even 1GB is enough. Do you see any suspicious log messages around the time it crashes?

The thing about Docker volumes is maybe just a personal paranoia, but I don't like trusting docker's internal filesystem for storing important data long-term. I've lost volumes in the past when moving between machines because they weren't attached to any local folder that I could quickly copy over with the compose file. I also like being able to restart a docker setup from scratch without losing application state by doing docker system prune --all (which deletes all ephemeral volumes but not bound folder contents).

<!-- gh-comment-id:690793165 --> @pirate commented on GitHub (Sep 10, 2020): It sounds like you are not memory limited, normally 2GB or even 1GB is enough. Do you see any suspicious log messages around the time it crashes? The thing about Docker volumes is maybe just a personal paranoia, but I don't like trusting docker's internal filesystem for storing important data long-term. I've lost volumes in the past when moving between machines because they weren't attached to any local folder that I could quickly copy over with the compose file. I also like being able to restart a docker setup from scratch without losing application state by doing `docker system prune --all` (which deletes all ephemeral volumes but not bound folder contents).
Author
Owner

@zblesk commented on GitHub (Sep 14, 2020):

Don't know, the problem stopped appearing. Perhaps it was unrelated. 🤷🏻‍♀️

I run archivebox update in the docker container, but it's taking a very long time, some 2 minutes per every web page. Can I safely run archivebox update multiple times in parallel, without risking some data corruption/DB inconsistency?
(I.e., the links already are in the DB, in 'pending' state - they just haven't been processed yet. Since I still have ~2 000 links waiting, and ~14 000 more to go, at this rate it'd take weeks...)

<!-- gh-comment-id:692009323 --> @zblesk commented on GitHub (Sep 14, 2020): Don't know, the problem stopped appearing. Perhaps it was unrelated. 🤷🏻‍♀️ I run `archivebox update` in the docker container, but it's taking a very long time, some 2 minutes per every web page. Can I safely run `archivebox update` multiple times in parallel, without risking some data corruption/DB inconsistency? (I.e., the links already are in the DB, in 'pending' state - they just haven't been processed yet. Since I still have ~2 000 links waiting, and ~14 000 more to go, at this rate it'd take weeks...)
Author
Owner

@zblesk commented on GitHub (Sep 15, 2020):

Ok, tried it, didn't work. (Crashes because database locked.)
Is there anything I can do to speed it up?

<!-- gh-comment-id:692945386 --> @zblesk commented on GitHub (Sep 15, 2020): Ok, tried it, didn't work. (Crashes because database locked.) Is there anything I can do to speed it up?
Author
Owner

@tonylaw7 commented on GitHub (Dec 7, 2020):

I'm experiencing a similar issue when running the docker container with update command. I'm running ArchiveBox on a virtual machine with 8GB of RAM, and had no issues with previous versions when using update.

Here's my output:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 33, in <module>
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
  File "/app/archivebox/cli/__init__.py", line 123, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 63, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/app/archivebox/cli/archivebox_update.py", line 108, in main
    update(
  File "/app/archivebox/util.py", line 113, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/main.py", line 717, in update
    archive_links(to_archive, overwrite=overwrite, out_dir=out_dir)
  File "/app/archivebox/util.py", line 113, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 157, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/app/archivebox/util.py", line 113, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 120, in archive_link
    write_link_details(link, out_dir=out_dir, skip_sql_index=skip_index)
  File "/app/archivebox/util.py", line 113, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/index/__init__.py", line 341, in write_link_details
    write_sql_link_details(link)
  File "/app/archivebox/util.py", line 113, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/index/sql.py", line 80, in write_sql_link_details
    snap.save()
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 745, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 782, in save_base
    updated = self._save_table(
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 864, in _save_table
    updated = self._do_update(base_qs, using, pk_val, values, update_fields,
  File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 917, in _do_update
    return filtered._update(values) > 0
  File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 771, in _update
    return query.get_compiler(self.db).execute_sql(CURSOR)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1500, in execute_sql
    cursor = super().execute_sql(result_type)
  File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1152, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: database is locked

Enviroment:

ArchiveBox v0.4.24
Ubuntu 20
Virtual Machine/ 8GB RAM
<!-- gh-comment-id:740226653 --> @tonylaw7 commented on GitHub (Dec 7, 2020): I'm experiencing a similar issue when running the docker container with update command. I'm running ArchiveBox on a virtual machine with 8GB of RAM, and had no issues with previous versions when using update. Here's my output: ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) sqlite3.OperationalError: database is locked The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 33, in <module> sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')()) File "/app/archivebox/cli/__init__.py", line 123, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 63, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/app/archivebox/cli/archivebox_update.py", line 108, in main update( File "/app/archivebox/util.py", line 113, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/main.py", line 717, in update archive_links(to_archive, overwrite=overwrite, out_dir=out_dir) File "/app/archivebox/util.py", line 113, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 157, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/app/archivebox/util.py", line 113, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 120, in archive_link write_link_details(link, out_dir=out_dir, skip_sql_index=skip_index) File "/app/archivebox/util.py", line 113, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/index/__init__.py", line 341, in write_link_details write_sql_link_details(link) File "/app/archivebox/util.py", line 113, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/index/sql.py", line 80, in write_sql_link_details snap.save() File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 745, in save self.save_base(using=using, force_insert=force_insert, File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 782, in save_base updated = self._save_table( File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 864, in _save_table updated = self._do_update(base_qs, using, pk_val, values, update_fields, File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 917, in _do_update return filtered._update(values) > 0 File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 771, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1500, in execute_sql cursor = super().execute_sql(result_type) File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1152, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) django.db.utils.OperationalError: database is locked ``` Enviroment: ``` ArchiveBox v0.4.24 Ubuntu 20 Virtual Machine/ 8GB RAM ```
Author
Owner

@pirate commented on GitHub (Dec 7, 2020):

@tonylaw7 what version are you running? Can you post the output of archivebox version and archivebox status.

<!-- gh-comment-id:740240767 --> @pirate commented on GitHub (Dec 7, 2020): @tonylaw7 what version are you running? Can you post the output of `archivebox version` and `archivebox status`.
Author
Owner

@tonylaw7 commented on GitHub (Dec 7, 2020):

@tonylaw7 what version are you running? Can you post the output of archivebox version and archivebox status.

ArchiveBox v0.4.24

<!-- gh-comment-id:740245484 --> @tonylaw7 commented on GitHub (Dec 7, 2020): > > > @tonylaw7 what version are you running? Can you post the output of `archivebox version` and `archivebox status`. ArchiveBox v0.4.24
Author
Owner

@pirate commented on GitHub (Dec 9, 2020):

@tonylaw7

Can you post the output of archivebox version and archivebox status

@zblesk v0.5.0 has many speed improvements that should make multi-process archiving better, but it's not finished yet, give us a week or so for the final testing.

<!-- gh-comment-id:741293004 --> @pirate commented on GitHub (Dec 9, 2020): @tonylaw7 > Can you post the output of `archivebox version` and `archivebox status` @zblesk v0.5.0 has many speed improvements that should make multi-process archiving better, but it's not finished yet, give us a week or so for the final testing.
Author
Owner

@dohlin commented on GitHub (Jan 15, 2021):

I too am getting this "sqlite3.OperationalError: database is locked" error seemingly randomly on v0.5, Ubuntu 20.04 too...if I reboot and run archivebox update again it doesn't fail where it did previously, but it will eventually fail again. For kicks I put 32GB of RAM on this VM and it's still seeing this error. Anything else I can try?

<!-- gh-comment-id:761054492 --> @dohlin commented on GitHub (Jan 15, 2021): I too am getting this "sqlite3.OperationalError: database is locked" error seemingly randomly on v0.5, Ubuntu 20.04 too...if I reboot and run archivebox update again it doesn't fail where it did previously, but it will eventually fail again. For kicks I put 32GB of RAM on this VM and it's still seeing this error. Anything else I can try?
Author
Owner

@pirate commented on GitHub (Feb 1, 2021):

This original issue should be fixed now in the latest v0.5.4.

The other error sqlite3.OperationalError: database is locked is due to archivebox being slow :(, unfortunately its an architectural issue we're still working on, stay tuned for v0.6. For now don't throw extra RAM/CPU at it, rather try avoid archiving more than 1 link at once, or using the UI heavily while it's in the middle of archiving.

<!-- gh-comment-id:770732409 --> @pirate commented on GitHub (Feb 1, 2021): This original issue should be fixed now in the latest v0.5.4. The other error `sqlite3.OperationalError: database is locked` is due to archivebox being slow :(, unfortunately its an architectural issue we're still working on, stay tuned for v0.6. For now don't throw extra RAM/CPU at it, rather try avoid archiving more than 1 link at once, or using the UI heavily while it's in the middle of archiving.
Author
Owner

@dohlin commented on GitHub (Apr 2, 2021):

@pirate Is there anything additional that can be done to minimize instances of the "sqlite3.OperationalError: database is locked" error? Long ago I had a very early build of ArchiveBox running, but I neglected to ever upgrade it. And due to the massive number of changes to ArchiveBox between then and now I never was able to get it to successfully "upgrade". So, I've been trying to "start fresh" with an up-to-date build.

Unfortunately, as of v0.5.6 I'm still seeing this error pretty consistently while trying to complete my "initial" archive of bookmarked links. I have quite a few (probably way too many...>2k) and while I could probably clean some out I keep many of them for reference purposes.

Is there any other timeout setting or anything I can try to increase or adjust to lessen this error at all? Anything to make it so that I don't have to re-run archivebox update again and again and again every hour/few hours? I don't have anything else running on this VM and the only "UI usage" I've done on it is occasional checks to see if it's still running or not. Gonna to take a long time to get through this many links if not lol :)

And as always - thank you for your hard work on this!! I and many others really appreciate it!

<!-- gh-comment-id:812623572 --> @dohlin commented on GitHub (Apr 2, 2021): @pirate Is there anything additional that can be done to minimize instances of the "sqlite3.OperationalError: database is locked" error? Long ago I had a very early build of ArchiveBox running, but I neglected to ever upgrade it. And due to the _massive_ number of changes to ArchiveBox between then and now I never was able to get it to successfully "upgrade". So, I've been trying to "start fresh" with an up-to-date build. Unfortunately, as of v0.5.6 I'm still seeing this error pretty consistently while trying to complete my "initial" archive of bookmarked links. I have quite a few (probably way too many...>2k) and while I could probably clean some out I keep many of them for reference purposes. Is there any other timeout setting or anything I can try to increase or adjust to lessen this error at all? Anything to make it so that I don't have to re-run `archivebox update` again and again and again every hour/few hours? I don't have anything else running on this VM and the only "UI usage" I've done on it is occasional checks to see if it's still running or not. Gonna to take a long time to get through this many links if not lol :) And as always - **thank you for your hard work on this!!** I and many others really appreciate it!
Author
Owner

@mAAdhaTTah commented on GitHub (Apr 2, 2021):

@dohlin If you import the links one at a time, via the CLI, and keep the web server off during that time, you should only have the CLI process locking/using the db which should minimize/eliminate the problem.

Alternatively, if you still have the results from the early build, i would update incrementally. Meaning, instead of going straight to the current version, you install each incremental version, upgrade the content, then install the next version. This might be safer than tryna go all the way at once.

<!-- gh-comment-id:812674022 --> @mAAdhaTTah commented on GitHub (Apr 2, 2021): @dohlin If you import the links one at a time, via the CLI, and keep the web server off during that time, you should only have the CLI process locking/using the db which should minimize/eliminate the problem. Alternatively, if you still have the results from the early build, i would update incrementally. Meaning, instead of going straight to the current version, you install each incremental version, upgrade the content, then install the next version. This might be safer than tryna go all the way at once.
Author
Owner

@dohlin commented on GitHub (Apr 2, 2021):

@mAAdhaTTah Ok good to know. The old build I was on was ollllddd and honestly I don't even know if I have backups still of that VM since I've rebuilt it since. At this point I'm better off just rolling with this from scratch for now...I've got 10 of 57 pages done so far LOL. Thanks!

<!-- gh-comment-id:812680992 --> @dohlin commented on GitHub (Apr 2, 2021): @mAAdhaTTah Ok good to know. The old build I was on was ollllddd and honestly I don't even know if I have backups still of that VM since I've rebuilt it since. At this point I'm better off just rolling with this from scratch for now...I've got 10 of 57 pages done so far LOL. Thanks!
Author
Owner

@pirate commented on GitHub (Apr 2, 2021):

@dohlin I would also recommend the incremental upgrade, although you don't have to do it though every intermediate version. v0.4.x was specifically designed to handle importing really old archives, so if you go from the old version to v0.4.24, then from there to v0.5.6, it should work in only 2 steps. 2k links is well within the realm of what it can handle, it should only start getting sketchy above ~25k links (and v0.6 coming soon is tested to be stable up to 150k). v0.6 also has many fixes that improve performance overall, though it's not totally solved the db locking issue, it should be much better when that comes out.

Also as @mAAdhaTTah mentioned, make sure you have the webserver stopped and only use 1 CLI process to do the upgrade/import, there should be no concurrency / locking issues with only 1 process.

<!-- gh-comment-id:812718339 --> @pirate commented on GitHub (Apr 2, 2021): @dohlin I would also recommend the incremental upgrade, although you don't have to do it though every intermediate version. v0.4.x was specifically designed to handle importing really old archives, so if you go from the old version to v0.4.24, then from there to v0.5.6, it should work in only 2 steps. 2k links is well within the realm of what it can handle, it should only start getting sketchy above ~25k links (and v0.6 coming soon is tested to be stable up to 150k). v0.6 also has many fixes that improve performance overall, though it's not totally solved the db locking issue, it should be much better when that comes out. Also as @mAAdhaTTah mentioned, make sure you have the webserver stopped and only use 1 CLI process to do the upgrade/import, there should be no concurrency / locking issues with only 1 process.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#311
No description provided.