mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1717] Feature Request: Better Forum archiving #4042
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#4042
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @observeroftime01 on GitHub (Dec 13, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1717
Originally assigned to: @pirate on GitHub.
What type of suggestion are you making?
Proposing a new feature
What is the problem that your feature request solves?
Let us assume I'm talking about a forum thread with 500 pages, located at
https://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/. The thread of interest has many pages, accessible viahttps://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-2,https://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-3, and so on.As it stands currently, I have to generate a list of 500 individual URLs and add them to ArchiveBox (which will create 500 separate entries, one for every page). This will quickly clutter the dashboard and make finding anything a chore. All it takes is a few threads with many pages from the same forum, and things will quickly become unmanageable.
What is your proposed solution?
I would like to feed archivebox a URL like
https://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME, be asked how many pages it should download (500 in this example), and at the very least have everything saved under one expanding entry / heading.The logic for the download URLs does not have to be complicated. The user could provide a template URL like
https://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-$NUMBER, which will download everything fromhttps://www.SOMEFORUM.COM/threads/SOME_THREAD_NAMEtohttps://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-500. The resulting download of 500 pages could then be stored under one single heading (perhaps the thread title itself would make sense to use) that expands to show all 500 pages once clicked on.I can manage to generate the download URLs using a simple python script myself, so the request to have pages belonging to the same thread be saved under one expanding entry on the dashboard is more urgent.
I don't know how feasible it is to have navigation within the saved pages themselves (say, to get from saved page
https://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-2tohttps://www.SOMEFORUM.COM/threads/SOME_THREAD_NAME/page-3). Provided that all 500 pages belonging to the same thread end up in the same "collection", navigation through the forum page buttons itself doesn't need to work. Maybe a simple "previous / next" button could be displayed atop the navigation when entering a "collection" created this way, which takes you to the next page in the list?In any case, if anybody has any better suggestions and recommendations on how to back up forum content (and browse it properly) in a straightforward way, I'm all ears. Maybe this is all wildly out of scope, and there's better tools for this particular purpose I am not aware about.
What hacks or alternative solutions have you tried to solve the problem?
Tried assigning tags to forum threads to aid in navigation / finding pages that belong to the same thread
Share the entire output of the
archivebox versioncommand for the current verison you are using.How badly do you want this new feature?
Mini Survey
@pirate commented on GitHub (Dec 29, 2025):
forum-dl support and
--depth=Nrecursive crawl support are now implemented indev. let me know if that helps! dev is still wip but it shouldb e out in the next release