mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #742] Archivebox hangs when initializing collection on network drive that doesn't support FSYNC #3483
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3483
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nguyenhaiac on GitHub (May 9, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/742
My setup:
Issue:
@pirate commented on GitHub (May 10, 2021):
Your network share has to be able to support FSYNC, if it does not then you'll have to put the
index.sqlite3file on a local drive and only put thearchive/sub-folder on the network drive.See: https://github.com/ArchiveBox/ArchiveBox#storage-requirements
Most network drives support FSYNC if you configure them to, check the NFS/SAMBA docs to see how to set up FSYNC-compatible shares for your OS/NFS version.
This is for data integrity reasons. Too many users in the past accidentally corrupted their archives by running concurrent
archiveboxthreads on network filesystems that ended up clobbering each other's indexes, so now we require the filesystem where the index is stored to support atomic writes. This is for my own support sanity, and to prevent users from accidentally corrupting their indexes. If you want you can hack around it (seearchivebox/system.py:atomic_write), but I cannot officially support those use cases / handhold people toward setting it up (because it's often dangerous).Note:
archive/contains all the archived assets (and is responsible for the bulk of the disk usage), and can still be on a non-fsync compatible network share. e.g. this should work:See here for more info:
@nguyenhaiac commented on GitHub (May 19, 2021):
It's not a bug and there is a work around.
@pirate commented on GitHub (Apr 12, 2022):
Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting
Contributions/suggestions welcome there.