mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #781] Database is locked and other weird behavior when doing simultaneous adds #2004
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2004
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jgoerzen on GitHub (Jul 5, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/781
Describe the bug
Simultaneous invocations of "archivebox add" crash with a database locked error, or, with enough persistence, have mistakes with others.
Steps to reproduce
The first invocation of "archivebox add" works normally.
While it continues to run, subsqeuent invocations crash with a "database is locked" error at the point where they attempt to insert into the master index. Oddly, it seems they do manage to insert SOME data into the master index. Rerunning the add with the same source will cause the number of items to add to the master index to reduce, until eventually the second add starts executing as well (this may take dozens of attempts for large sets).
At that point, however, both "add" processes that are running begin to develop mysterious errors in the processing stages, and I am unsure of the reliability.
Screenshots or log output
The failed "archivebox add" error looks like this:
I have observed that if the first downloader is doing something big, like downloading from Youtube, it is possible that the subsequent ones will proceed without an error.
ArchiveBox version
@pirate commented on GitHub (Jul 5, 2021):
Expected behavior, fully parallel add is not yet supported, see https://github.com/ArchiveBox/ArchiveBox/issues/91.
It's complicated to implement because SQLite does not support multiple writers, so we have to do inter-process coordination and use a job queue system to serialize writes at the application level, which is a major refactor not expected for at least another 6 months.
@jgoerzen commented on GitHub (Jul 5, 2021):
Ah. Well, over in the docs here:
https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#large-archives
it was explicitly suggested, so maybe that doc page needs fixing?
@pirate commented on GitHub (Apr 12, 2022):
Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting
Contributions/suggestions welcome there.