[GH-ISSUE #553] queuing for add #349

New issue

Closed

opened 2026-03-01 14:42:46 +03:00 by kerem · 2 comments

kerem commented

2026-03-01 14:42:46 +03:00

Owner

Originally created by @shepner on GitHub (Nov 28, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/553

Type

General question or discussion
Propose a brand new feature
Request modification of existing behavior or design

What is the problem that your feature request solves

archivebox add can be slow and I typically just want a quick "fire and forget" way to submit new URLs. Id also like this to be a multi-threaded process.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Implement a command (ie archivebox queue) which parses the input (similar to archivebox add) and places each URL into a message queue. On the other side of the message queue, have a process which will kick off an archivebox add command in the background per CPU available.

What hacks or alternative solutions have you tried to solve the problem?

While "doing it right" is a rather involved process, this could also be done external to archivebox itself as scripting within the Docker container. Ive done "quick and dirty" variants similar to this a few times over the years with Python (and Perl) scripts.

In the simplest form, the message queue could just be a list or even a file. Running multiple threads can be as simple as just watching to ensure no more than N instances are running at any given time and pulling more entries from the queue when there are more slots open.

How badly do you want this new feature?

It's an urgent deal-breaker, I can't live without it
It's important to add it in the near-mid term future
It would be nice to have eventually

I'm willing to contribute dev time / money to fix this issue
I like ArchiveBox so far / would recommend it to a friend
I've had a lot of difficulty getting ArchiveBox set up

Originally created by @shepner on GitHub (Nov 28, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/553 ## Type - [ ] General question or discussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves `archivebox add` can be slow and I typically just want a quick "fire and forget" way to submit new URLs. Id also like this to be a multi-threaded process. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes Implement a command (ie `archivebox queue`) which parses the input (similar to `archivebox add`) and places each URL into a message queue. On the other side of the message queue, have a process which will kick off an `archivebox add` command in the background per CPU available. ## What hacks or alternative solutions have you tried to solve the problem? While "doing it right" is a rather involved process, this could also be done external to `archivebox` itself as scripting within the Docker container. Ive done "quick and dirty" variants similar to this a few times over the years with Python (and Perl) scripts. In the simplest form, the message queue could just be a list or even a file. Running multiple threads can be as simple as just watching to ensure no more than N instances are running at any given time and pulling more entries from the queue when there are more slots open. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [x] It's important to add it in the near-mid term future - [ ] It would be nice to have eventually --- - [ ] I'm willing to contribute dev time / money to fix this issue - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up

kerem

2026-03-01 14:42:46 +03:00

closed this issue
added the
why: functionality

status: idea-phase
labels

kerem commented

2026-03-01 14:42:46 +03:00

Author

Owner

@cdvv7788 commented on GitHub (Nov 28, 2020):

@pirate this is related to the huey implementation, right?

@cdvv7788 commented on GitHub (Nov 28, 2020): @pirate this is related to the huey implementation, right?

kerem commented

2026-03-01 14:42:47 +03:00

Author

Owner

@pirate commented on GitHub (Nov 28, 2020):

The message queue-style implementation is coming soon with Huey, but the behavior you want can already be achieved with:

# add the URL to the index only, without running any of the archiving methods yet (effectively queuing it)
archivebox add --index-only https://example.com/some/url/here

...
# then run this later on / in a separate process to actually archive everything
archivebox update

Going to close this for now because the Huey implementation is already a long-running dev task we're tracking in other issues.
Feel free to reply if you still have questions / want help though and I'll continue answering here.

@pirate commented on GitHub (Nov 28, 2020): The message queue-style implementation is coming soon with Huey, but the behavior you want can already be achieved with: ![image](https://user-images.githubusercontent.com/511499/100522341-f1a62980-3177-11eb-920c-26399f02e841.png) ```bash # add the URL to the index only, without running any of the archiving methods yet (effectively queuing it) archivebox add --index-only https://example.com/some/url/here ... # then run this later on / in a separate process to actually archive everything archivebox update ``` Going to close this for now because the Huey implementation is already a long-running dev task we're tracking in other issues. Feel free to reply if you still have questions / want help though and I'll continue answering here.

kerem referenced this issue

2026-03-01 14:48:29 +03:00

[PR #349] [MERGED] Ui enhancements for snapshot addition #1117

kerem referenced this issue

2026-03-01 18:00:11 +03:00

[PR #349] [MERGED] Ui enhancements for snapshot addition #2626

kerem referenced this issue

2026-03-15 01:28:00 +03:00

[PR #349] [MERGED] Ui enhancements for snapshot addition #4132