[GH-ISSUE #348] Question: How can I download just one webpage and not the webpages that are linked to the first one (disable recursion)? #1762

New issue

Closed

opened 2026-03-01 17:53:26 +03:00 by kerem · 1 comment

kerem commented

2026-03-01 17:53:26 +03:00

Owner

Originally created by @YousufSSyed on GitHub (Jun 19, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/348

Let's say I want to download just google.com. I want all of the assets to show that site loaded, but I don't want to archive the webpages that google.com links to, like gmail.com or google.com/images?

Originally created by @YousufSSyed on GitHub (Jun 19, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/348 Let's say I want to download just google.com. I want all of the assets to show that site loaded, but I don't want to archive the webpages that google.com links to, like gmail.com or google.com/images?

kerem closed this issue

2026-03-01 17:53:27 +03:00

kerem commented

2026-03-01 17:53:27 +03:00

Author

Owner

@pirate commented on GitHub (Jun 19, 2020):

From the documentation on the wiki:

Archive just one page:

echo 'https://example.com/some/url/here' | ./archive

Archive all the links from an import source (e.g. an RSS feed, page, text file, etc.):

./archive some/file/example.txt
# or
./archive https://example.com/some/rss/feed.xml
# or
./archive https://example.com/some/page/full/of/urls.html
# etc...

This design is changed in >v0.4 where it becomes:

archivebox add 'https://example.com/just/one/page.html'
# or 
archivebox add --depth=1 'https://example.com/one/page/and/all/its/outlinks.html'
# etc.
archivebox add --depth=2 'https://example.com/page/plus/2/hops/of/links.html'

@pirate commented on GitHub (Jun 19, 2020): From the [documentation](https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin) on the wiki: Archive just one page: ```bash echo 'https://example.com/some/url/here' | ./archive ``` Archive all the links from an import source (e.g. an RSS feed, page, text file, etc.): ```bash ./archive some/file/example.txt # or ./archive https://example.com/some/rss/feed.xml # or ./archive https://example.com/some/page/full/of/urls.html # etc... ``` This design is changed in [>v0.4](https://github.com/pirate/ArchiveBox/wiki/Roadmap#-archivebox-add) where it becomes: ``` archivebox add 'https://example.com/just/one/page.html' # or archivebox add --depth=1 'https://example.com/one/page/and/all/its/outlinks.html' # etc. archivebox add --depth=2 'https://example.com/page/plus/2/hops/of/links.html' ```

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/ArchiveBox#1762

No description provided.

Rows
Columns