mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #234] Architecture: Concurrent runs accidentally delete each other's temp files, leaving the index broken #1671
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1671
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @anarcat on GitHub (May 6, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/234
Describe the bug
As part of my ridiculously large archiving attempt (partly documented in #233), I have done a first batch of URL imports with the first 100 URLs found. For a reason I can't explain (maybe because I ran two
archivebox addcommands in parallel?), that eventually crashed with:No problem, I thought - I can resume! So I did that with
But that crashed as well, with:
I suspect this is because
--update-allactually expects a list of URLs to be passed, but the usage doesn't make that clear and we shouldn't be crashing there.Steps to reproduce
archivebox add --update-allwith no other URLsScreenshots or log output
First, the original crash, not the subject of this bug report:
Readding the list does nothing:
Looking at
-h, I noticed--update-allso I try that:The correct call is of course to retry with the same URLs:
which works, but it would actually be nice to (a) not crash when
--update-allis passed without an argument (maybe just error in argument parsing more politely) and (b) eventually just do the right thing, which is probably to retry any failed URL from the database.Software versions
Thanks for your hard work, and sorry for the flood of bug reports! :)
@pirate commented on GitHub (May 6, 2019):
I added something recently called
atomic_write, and I think the behavior you're seeing is just a bug in my implementation that can be fixed quite easily. This is howatomic_writeworks right now:What you're encountering is the
finallyclause deleting a temp file that's being created by a different process. It can be fixed by making every temp file have a random, unique suffix such that two processes never attempt to modify the same temp file. After I push the fix I'll comment back and close this. I'll also improve testing and support for multicore runs in general in v0.4.0.@anarcat commented on GitHub (May 6, 2019):
you might want to reuse existing code for this, e.g.
https://github.com/untitaker/python-atomicwrites
https://github.com/rec/safer
@pirate commented on GitHub (Jul 24, 2020):
This should all be fixed in the latest
djangoversion. (we ended up using python-atomicwrites)If you still see any issues, comment back and I'll reopen the ticket.
I still recommend running it single-threaded only for now, the next version will have much better multicore support since we'll be removing the index.json and index.html main indexes that cause so many locking issues and writing race-conditions.
@pirate commented on GitHub (Apr 12, 2022):
Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting
Contributions/suggestions welcome there.