[GH-ISSUE #742] Archivebox hangs when initializing collection on network drive that doesn't support FSYNC #1974

Closed
opened 2026-03-01 17:55:28 +03:00 by kerem · 3 comments
Owner

Originally created by @nguyenhaiac on GitHub (May 9, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/742

My setup:

  • Odroid HC4 running dietpi as a Nas
  • 2 HDDs run in mergerfs pool
  • This is shared to the network via nfs

Issue:

  • I mount the nfs share to my main pc and attempt to run archivebox on main pc. The init process hang at creating sql database and initial migration. No error or termination, just hang.
  • I then install archivebox on the Odroid using docker-compose then it's fine, but I want to use it on my main pc since the odroid is stretching too much already.
Originally created by @nguyenhaiac on GitHub (May 9, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/742 My setup: - Odroid HC4 running dietpi as a Nas - 2 HDDs run in mergerfs pool - This is shared to the network via nfs Issue: - I mount the nfs share to my main pc and attempt to run archivebox on main pc. The init process hang at creating sql database and initial migration. No error or termination, just hang. - I then install archivebox on the Odroid using docker-compose then it's fine, but I want to use it on my main pc since the odroid is stretching too much already.
Author
Owner

@pirate commented on GitHub (May 10, 2021):

Your network share has to be able to support FSYNC, if it does not then you'll have to put the index.sqlite3 file on a local drive and only put the archive/ sub-folder on the network drive.

See: https://github.com/ArchiveBox/ArchiveBox#storage-requirements

Most network drives support FSYNC if you configure them to, check the NFS/SAMBA docs to see how to set up FSYNC-compatible shares for your OS/NFS version.

This is for data integrity reasons. Too many users in the past accidentally corrupted their archives by running concurrent archivebox threads on network filesystems that ended up clobbering each other's indexes, so now we require the filesystem where the index is stored to support atomic writes. This is for my own support sanity, and to prevent users from accidentally corrupting their indexes. If you want you can hack around it (see archivebox/system.py:atomic_write), but I cannot officially support those use cases / handhold people toward setting it up (because it's often dangerous).

Note: archive/ contains all the archived assets (and is responsible for the bulk of the disk usage), and can still be on a non-fsync compatible network share. e.g. this should work:

./                               # ArchiveBox data folder
    index.sqlite3                # must be on local SSD/HDD (make sure to back it up still)
    ArchiveBox.conf              # must be on local SSD/HDD
    sources/                     # ok to put on network mount
        ...
    archive/                     # ok to put on network mount
        ...

See here for more info:

<!-- gh-comment-id:836803748 --> @pirate commented on GitHub (May 10, 2021): Your network share has to be able to support FSYNC, if it does not then you'll have to put the `index.sqlite3` file on a local drive and only put the `archive/` sub-folder on the network drive. See: https://github.com/ArchiveBox/ArchiveBox#storage-requirements Most network drives support FSYNC if you configure them to, check the NFS/SAMBA docs to see how to set up FSYNC-compatible shares for your OS/NFS version. This is for data integrity reasons. Too many users in the past accidentally corrupted their archives by running concurrent `archivebox` threads on network filesystems that ended up clobbering each other's indexes, so now we require the filesystem where the index is stored to support atomic writes. This is for my own support sanity, and to prevent users from accidentally corrupting their indexes. If you want you can hack around it (see `archivebox/system.py:atomic_write`), but I cannot officially support those use cases / handhold people toward setting it up (because it's often dangerous). Note: `archive/` contains all the archived assets (and is responsible for the bulk of the disk usage), and can still be on a non-fsync compatible network share. e.g. this should work: ``` ./ # ArchiveBox data folder index.sqlite3 # must be on local SSD/HDD (make sure to back it up still) ArchiveBox.conf # must be on local SSD/HDD sources/ # ok to put on network mount ... archive/ # ok to put on network mount ... ``` See here for more info: - https://github.com/ArchiveBox/ArchiveBox/issues/456 - https://github.com/ArchiveBox/ArchiveBox/issues/722
Author
Owner

@nguyenhaiac commented on GitHub (May 19, 2021):

It's not a bug and there is a work around.

<!-- gh-comment-id:843730978 --> @nguyenhaiac commented on GitHub (May 19, 2021): It's not a bug and there is a work around.
Author
Owner

@pirate commented on GitHub (Apr 12, 2022):

Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting

Contributions/suggestions welcome there.

<!-- gh-comment-id:1097263618 --> @pirate commented on GitHub (Apr 12, 2022): Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting Contributions/suggestions welcome there.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1974
No description provided.