mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1082] Call for public comments: Considering deprecating the archivebox oneshot command as of the 0.7 release #3695
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3695
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pirate on GitHub (Jan 11, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1082
Long long ago before
archiveboxwas a Django app, it used to be a one-shot bash script calledarchive-pocket-stream.sh. When we moved to the Django systemarchivebox oneshotwas provided as an escape hatch for users that did not like being forced to create collections and manage data directories all of a sudden. It allows the new fancy django archivebox to run in "oneshot" mode without creating a main index file, data dir, etc. and only outputting the results of one snapshot into PWD.As you might imagine, it required tremendous haxx to run the new Django archivebox without a db file in this way, including instantiating a fake sqlite3 db in memory, filesystem write filtering, etc. and it's imposing a large maintenance burden by making it hard to refactor other subsystems.
Now that we have solidly been on Django for several major versions, I think we can safely retire
archivebox oneshot?Iif anyone is using it, speak up now and make a case for keeping it 😅 🤠👋
@ianrobertsFF commented on GitHub (Apr 7, 2023):
This is my entire use case, I need to be able to do single page snapshots on a daily basis, and have no need for any of the other functionality of ArchiveBox, I'll be running it CLI only, and using another tool to trigger the daily snapshots of the pages, and piping them into specific directories.
Without the ability to create multiple snapshots of the same URL in your normal use cases, this is the only way I can achieve it.
@pirate commented on GitHub (Apr 7, 2023):
Good to know! Would your needs be satisfied if we add better native support for multiple snapshots in archivebox instead of keeping this older feature? @ianrobertsFF
@jvican commented on GitHub (Apr 9, 2023):
I'm also using this. I think it makes a lot of sense to keep a command like oneshot around because it's fairly self-contained, and it aligns well with the UNIX philosophy. It does one thing and it does it well, without the need for
archive initand the use of the rest of the software. Please don't take it away.@ianrobertsFF commented on GitHub (Apr 9, 2023):
My needs would be satisfied by multiple snapshots, although I still wouldn't be using any of the functionality that oneshot doesn't currently use, so it wouldn't be a better workflow, as oneshot does exactly what I need.
However assuming I can continue to take on-demand snapshots with the native support for multiple snapshots, this would be acceptable to me.
@jwmh commented on GitHub (Jul 31, 2023):
Q:
Would it be possible to fork this off into its own separate project/repo?
Would it even be desireable?
I appreciate @jvican ’s comments on this, and agree.
@pirate commented on GitHub (Dec 18, 2023):
Ok I've decided to keep
oneshotbecause it ties in nicely with the ongoing refactor to move ArchiveBox towards an event-driven job queue model. The oldoneshotwill be renamed and joined by a new command to run a single extractor method:archivebox snapshotCan be run to snapshot an individual URL into the current directory (runs all extractors by default).
This works the same way as
oneshotdoes now, and I'll aliasoneshotto the new command so we don't break backwards compatibility.archivebox extractThis runs an individual extractor method and outputs into the current directory.
After the refactor,
archivebox addwill work by internally enqueuing a job that runsarchivebox snapshot ...for each imported URL.The snapshot job then in turn enqueues a job for each extractor needed on that URL.
Each extractor job then runs
archivebox extract --method=...internally to write the output into the final archive directory.Please subscribe to this issue for updates: https://github.com/ArchiveBox/ArchiveBox/issues/1289