mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #333] Question: How far does archivebox traverse? #240
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#240
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @vext01 on GitHub (Mar 26, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/333
Hi,
I've recently discovered archivebox -- what a neat tool!
To try it out, I ran it on my personal website and was surprised to find that it followed links outside of my website too!
So my question is: How many links does it follow before stopping? Can this be controlled in any way?
Thanks!
P.S. I'm an OpenBSD developer. If you can get this up on PyPI, I'll happily make a port so that archivebox can be in the package manager.
@pirate commented on GitHub (Mar 27, 2020):
If you pipe a link in via stdin it archives just that link, if you pass a URL as an arg it interprets it as a source to import other links from.
See:
https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin
vs
https://github.com/pirate/ArchiveBox/wiki/Usage#import-list-of-links-exported-from-browser-or-another-service
This difference in behavior is intentional but not intuitive, so it's been changed in the upcoming v0.4
archivebox addCLI design.Thanks for the offer re: OpenBSD! If you want to subscribe to PR #207 you'll get an update when v0.4 ships on PyPI.
@vext01 commented on GitHub (Mar 27, 2020):
I see. That indeed isn't intuitive. The new CLI makes much more sense. Looking forward to that!
So with the current design, if I pass a URL as an arg, it follows links 1 deep. Is that correct?
Many thanks. Subscribed.
@pirate commented on GitHub (Mar 31, 2020):
In a sense it follows one link deep, but that's not really what you want if you're looking for recursive archiving since it doesn't archive the original URL. What it's really doing is treating the path/link argument as a feed to import a list of links from, e.g. a browser history or pinboard export.