mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1599] Bug: "Archive Again" with multiple URLs breaks all Chromium based archival methods, and others #956
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#956
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nguyenmp on GitHub (Nov 18, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1599
Originally assigned to: @pirate on GitHub.
Provide a screenshot and describe the bug
When I submit multiple URLs for archival, it's very serial which I think is intentional and good.
However, when I select multiple snapshots and click "Archive Again", it's very noticeably done in parallel and breaks the Chromium profile. It'll sometimes leave the Chromium lock files in
/data/personas/Default/chrome_profile/Singleton*which prevents future Chromium launches. Pretty much all archival attempts fail on the second run and even single URLs will fail after triggering the Chromium lockfile issue.I'm not exactly sure what the knock-on effects are but the following fail very consistently once I get into this state:
Workaround is to delete the Chrome profile and only submit one URL at any time:
Steps to reproduce
docker run -v "./data/:/data/" archivebox/archivebox:dev archivebox initdocker run -v "./data/:/data/" -it archivebox/archivebox:dev archivebox manage createsuperuserdocker run -p "8000:8000" -v "./data/:/data/" archivebox/archivebox:devLogs or errors
ArchiveBox Version
How did you install the version of ArchiveBox you are using?
Docker (or other container system like podman/LXC/Kubernetes or TrueNAS/Cloudron/YunoHost/etc.)
What operating system are you running on?
macOS (including Docker on macOS)
What type of drive are you using to store your ArchiveBox data?
data/is on a local SSD or NVMe drivedata/is on a spinning hard drive or external USB drivedata/is on a network mount (e.g. NFS/SMB/CIFS/etc.)data/is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)Docker Compose Configuration
ArchiveBox Configuration
@TobiasHonscha commented on GitHub (Nov 22, 2024):
I have the same problem !
@pirate commented on GitHub (Nov 22, 2024):
Yup this is a known old issue that the new
Personassystem (WIP) is being built to address.In the upcoming release it will copy the entire chrome profile directory to a unique tmp dir before starting a new chrome intsance, which should allow parallel chrome instances to run at once without stepping on each other's lockfiles.
I'm going to initially soft-limit it to 4 maximum instances running in parallel per-machine-per-collection to prevent hitting too many ratelimits, but let users configure it to be higher if so people can scale it if they have more advanced infrastructure (e.g. VPNs/proxies/extra CPUs/etc) that can handle it.