[GH-ISSUE #333] Question: How far does archivebox traverse? #240

Closed
opened 2026-03-01 14:41:45 +03:00 by kerem · 3 comments
Owner

Originally created by @vext01 on GitHub (Mar 26, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/333

Hi,

I've recently discovered archivebox -- what a neat tool!

To try it out, I ran it on my personal website and was surprised to find that it followed links outside of my website too!

So my question is: How many links does it follow before stopping? Can this be controlled in any way?

Thanks!

P.S. I'm an OpenBSD developer. If you can get this up on PyPI, I'll happily make a port so that archivebox can be in the package manager.

Originally created by @vext01 on GitHub (Mar 26, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/333 Hi, I've recently discovered archivebox -- what a neat tool! To try it out, I ran it on my personal website and was surprised to find that it followed links outside of my website too! So my question is: How many links does it follow before stopping? Can this be controlled in any way? Thanks! P.S. I'm an OpenBSD developer. If you can get this up on PyPI, I'll happily make a port so that archivebox can be in the package manager.
kerem closed this issue 2026-03-01 14:41:45 +03:00
Author
Owner

@pirate commented on GitHub (Mar 27, 2020):

If you pipe a link in via stdin it archives just that link, if you pass a URL as an arg it interprets it as a source to import other links from.

See:

https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin

vs

https://github.com/pirate/ArchiveBox/wiki/Usage#import-list-of-links-exported-from-browser-or-another-service

This difference in behavior is intentional but not intuitive, so it's been changed in the upcoming v0.4 archivebox add CLI design.

Thanks for the offer re: OpenBSD! If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.

<!-- gh-comment-id:604956216 --> @pirate commented on GitHub (Mar 27, 2020): If you pipe a link in via stdin it archives just that link, if you pass a URL as an arg it interprets it as a source to import other links from. See: https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin vs https://github.com/pirate/ArchiveBox/wiki/Usage#import-list-of-links-exported-from-browser-or-another-service This difference in behavior is intentional but not intuitive, so it's been changed in the upcoming v0.4 [`archivebox add`](https://github.com/pirate/ArchiveBox/wiki/Roadmap#-archivebox-add) CLI design. Thanks for the offer re: OpenBSD! If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.
Author
Owner

@vext01 commented on GitHub (Mar 27, 2020):

If you pass a URL as an arg it interprets it as a source to import other links from.

I see. That indeed isn't intuitive. The new CLI makes much more sense. Looking forward to that!

So with the current design, if I pass a URL as an arg, it follows links 1 deep. Is that correct?

If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.

Many thanks. Subscribed.

<!-- gh-comment-id:605033509 --> @vext01 commented on GitHub (Mar 27, 2020): > If you pass a URL as an arg it interprets it as a source to import other links from. I see. That indeed isn't intuitive. The new CLI makes much more sense. Looking forward to that! So with the current design, if I pass a URL as an arg, it follows links 1 deep. Is that correct? > If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI. Many thanks. Subscribed.
Author
Owner

@pirate commented on GitHub (Mar 31, 2020):

In a sense it follows one link deep, but that's not really what you want if you're looking for recursive archiving since it doesn't archive the original URL. What it's really doing is treating the path/link argument as a feed to import a list of links from, e.g. a browser history or pinboard export.

<!-- gh-comment-id:606369230 --> @pirate commented on GitHub (Mar 31, 2020): In a sense it follows one link deep, but that's not really what you want if you're looking for recursive archiving since it doesn't archive the original URL. What it's really doing is treating the path/link argument as a *feed* to import a list of links from, e.g. a browser history or pinboard export.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#240
No description provided.