mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3174
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @knowncolor on GitHub (Apr 30, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/226
Similar to #191 I would like ArchiveBox to automatically follow and archive links up to a certain depth across domains.
This is a fantastic project!
@pirate commented on GitHub (Apr 30, 2019):
Thanks! I'm going to close this and tweak #191 slightly to make it clearer that it'll cover this use case as well 😁
The idea is that we'll expose the same flags on ArchiveBox as are available on
wgetitself:--mirror--level=5--span-hosts--recursive--no-parenthttps://www.gnu.org/software/wget/manual/wget.html#Recursive-Retrieval-Options-1
These flags together should cover all the use cases: archiving an entire domain, archiving an entire domain but only below the current directory level, and archiving recursively from a single page across multiple domains to a given depth.
I anticipate it will take a while to get to this point though (3-6 months likely), as we first have to build or integrate a crawler of some sort, and web crawling is an extremely complex process with lots of subtle nuance around configuration and environment.