mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #902] Refactor ArchiveResult filesystem calls to go through Django file storage backend #561
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#561
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pirate on GitHub (Dec 16, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/902
Instead of this:
We should be doing this:
settings.py:https://django-storages.readthedocs.io/en/latest/
https://docs.djangoproject.com/en/4.0/topics/files/
related: https://github.com/ArchiveBox/ArchiveBox/issues/788
@mAAdhaTTah commented on GitHub (Dec 16, 2021):
How do you imagine this interacting with the extractors? Will they write via Django storage?
@pirate commented on GitHub (Dec 16, 2021):
Yeah, I have a big re-archictecturing plan to move to plug-in-style hooks system that allow components to modify any archivebox behavior, including filesystem calls, extractors, parsing, etc.
@pirate commented on GitHub (Dec 16, 2021):
Actually I'm curious what you think of the new architecture, do you find this intuitive @mAAdhaTTah? https://gist.github.com/pirate/7193ab54557b051aa1e3a83191b69793
@caj-larsson commented on GitHub (Jul 15, 2022):
Interesting, I've been thinking about building a plugable extractor but if you are already working on this on a whole application scale I could help out with this instead.
I don't quite understand how available hooks in the internals will be defined.
Is it correct that, because of the deep merge behavior, the last plugin is called first?
@opentyler commented on GitHub (Aug 4, 2022):
This seems like a great solution to #940. Being able to use an external storage would be incredible.
What do you think are the pain points for integrating this solution? Would love to be able to help
@brendanberg commented on GitHub (Dec 2, 2023):
I'm very interested in the ability to configure storage backends—specifically S3. Is there a way I can help out with implementing this?
@pirate commented on GitHub (Dec 2, 2023):
The main blocker is figuring out how to migrate from the existing paths to the new storage backend style. I'd like to do a proper data migration using the Django migrations system, but it takes some work to make sure it's atomic or resumable in the case of interruption (if any files need to move on disk).
Ideally the first pass implementation wouldn't move files around at all, it would just migrate the column to a file field, but I'd like to move all the extractor outputs into dedicated subfolders later so I want to make sure safe file moving is possible.
If you want to investigate the best practices around file moving during migrations and report back, that would be helpful!
@pirate commented on GitHub (Apr 10, 2024):
FYI all I just created a new Wiki page covering how to set up ArchiveBox with a remote filesystem: https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-Up-Storage
I confirmed it works myself with Amazon S3, Backblaze B2, SFTP, SMB, NFS. It should also work with Google Drive, OneDrive, DropBox, and all the other platforms that RClone supports.
This allows us to use remote filsystems for now without having to change the codebase / implement the
django-storageschanges discussed earlier.