mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #456] Support for network drives or filesystems that don't implement FSYNC #303
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#303
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @blackberryoctopus on GitHub (Aug 20, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/456
Describe the bug
The issue encountered occurs when attempting to initialize an archivebox directory on a mounted network drive. The issue is reproducible in multiple directory locations located on the NAS. Initializing an archivebox directory on the local drive of the machine succeed. Additionally, if a successfully initialized directory is copied to the NAS, future archivebox operations on that copied directory fail. The write of the Config file appears to fail based on the traceback.
I investigated the trace output and added print statements to the atomicwrites function in which the failure occurs at line 46 of atomicwrites init.py
Additionally, I uncommented the print statement at line 41 of archivebox system.py for further debug info.
Steps to reproduce
mkdir archivebox; cd archivebox; archivebox initScreenshots or log output
OUTPUT:
Here is the output of mount for the two disks:
SUCCESS DRIVE:
/dev/disk1s5 on /System/Volumes/Data (apfs, local, journaled, nobrowse)FAIL DRIVE:
//;AUTH=No%20User%20Authent@Drobo-5N2-RYE._afpovertcp._tcp.local/Public on /Volumes/Public (afpfs, nodev, nosuid, mounted by user123)Here are the file permissions of the FAIL directory
drwxrwxrwx 1 user123 staff 264 Aug 20 15:55 archiveboxSoftware versions
@pirate commented on GitHub (Aug 20, 2020):
It looks like your AFP server might not support AFP command
78(FPSyncDirviafcntl(F_FULLFSYNC)). In the past, ArchiveBox's lack of atomic writes have caused painful corruption issues when multiple processes tried to write to the same index files, so we fixed it by enforcing atomic writes everywhere with fsyncs and file renaming. A consequence of that is that if your underlying filesystem ignores/skips fsyncs, ArchiveBox will be completely unable to run.Is there any chance you're able to mount the network drive over SMB/NFS instead to test whether it's an issue with the network storage protocol or the filesystem or something else? Or maybe put your archive index.{json|sqlite3|html} and config file on a local drive and only put the
data/archive/subfolder on the network drive (which contains the actual page content)?Relevant:
@blackberryoctopus commented on GitHub (Aug 21, 2020):
@pirate
A different error occurs when mounting the NAS over smb, but the error appears to happen at the same point in the code execution.
Here is the new
mountoutput for the drive://GUEST:@169.254.8.156/Public on /Volumes/Public (smbfs, nodev, nosuid, noowners, mounted by user123)Here is the traceback
@pirate commented on GitHub (Aug 21, 2020):
Ah yeah, unfortunately if
F_FULLFSYNCis not supported then you're sort of screwed. I don't want to allow non-fsync'ed writes to disk in the codebase, it causes too many headaches.SMBv4 definitely supports FSYNC when configured to do so, can you check your NAS and see if there are any options you can tweak to allow fsyncs?
@blackberryoctopus commented on GitHub (Aug 21, 2020):
@pirate
Thanks for your help with the investigation.
I have one suggestion/question:
Is it possible to improve the initialization routine to output more descriptive debug info when a non-compatible drive destination is attempted for use?
@blackberryoctopus commented on GitHub (Nov 6, 2020):
@pirate I'm wondering if there's any update on this issue? If not, what changes need to be made to better inform users when they attempt archiving on unsupported drive volumes?
@pirate commented on GitHub (Nov 10, 2020):
Added an explicit error:
github.com/pirate/ArchiveBox@fbd9a7caa6This can maybe be removed in the future if we fully move to SQLite for everything (including config). But that's far in the future, so I'm closing this for now with the error msg.
@blackberryoctopus commented on GitHub (Nov 10, 2020):
Thank you!
@lylebrown commented on GitHub (Jan 28, 2021):
I'm attempting to set up SMB storage myself, but don't mind using local storage for the application/config files, just the archive (that will take up more file space) should be on a network drive.
You mentioned putting only
data/archiveon the SMB share. I tried that and I get 400 errors every time I attempt to archive a URL. Is that still not supported either? Just trying to understand if that's still an option or not. Here's the relevant section of mydocker-compose.yml.@pirate commented on GitHub (Jan 28, 2021):
It should be supported, not sure why it's failing. Do you mind running the server with
archivebox server --debug ...or settingarchivebox config --set DEBUG=Trueand posting the verbatim output / screenshots of those 400 errors. Will help narrow down what the root cause is.@lylebrown commented on GitHub (Jan 28, 2021):
So there were some permissions errors at first that I had to fix. But now I believe my issue is with atomic writes based on the error I'm getting. It's strange, because it did write the index.json file and it appears to be complete. Let me know if you need the full trace.
@pirate commented on GitHub (Jan 28, 2021):
I need the full trace to know what part of the code called
atomic_write, as the filesystem behavior may be different depending on where it's being called.@lylebrown commented on GitHub (Jan 28, 2021):
@pirate commented on GitHub (Jan 28, 2021):
oh it's just the chmod failing, not the actual write. What's the permissions and ownership on this dir
/data/archive/1611858110.058611and what user are you runningarchiveboxas?@lylebrown commented on GitHub (Jan 29, 2021):
That was the hint I needed, thanks! I may not be doing things in the ideal way with my smb share, but I'm using
file_mode=0777,dir_mode=0777as options in myfstab, and I needed to add thenopermoption for it to be able to respond to chmod calls with success (even if they aren't actually making changes).I'm open to suggestions to better manage permissions, but I don't think I have many options over smb.
@pirate commented on GitHub (Jan 29, 2021):
Yeah haha I also run my smb shares with forced
file_mode=0777,dir_mode=0777, too many hours of my life wasted fighting with permissions on shared network drives. I implement my file access permissions at other layers.@pirate commented on GitHub (Apr 12, 2022):
Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting
Contributions/suggestions welcome there.