mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1080] Is there a way to navigate to an archived URL directly without knowing its timestamp, e.g. https://archivebox.example.com/archive/en.wikipedia.org/wiki/Philosophy? #675
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#675
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @happening-primal on GitHub (Jan 4, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1080
I've played around with archivebox a few times in the past and I always run into a limitation related to the archive URL.
As per the documentation, and in reality, the archived URL will look something like this:
https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html
That's great, and you can scroll through your archived web pages and find this, but it not very convenient for actually just browsing through to archived pages. A method of doing this would be to use the Firefox Add-in Redirector to take all attempts to go to https://en.wikipedia.org/wiki/Dining_philosophers_problem.html and redirect them to https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html except....the issue is that you must know the archive number, in above example, 1493350273.
So, to my question as I've done a lot of searching on this topic with no luck. Is there a 'permalink' to the most recent copy of an archived page such that you can automate the browsing of your archive without needing to know the archive number?
If might look something like these examples below:
or any other simple algorithmic way to navigate to a desired web page snapshot located in your archive? Any help much appreciated.
@pirate commented on GitHub (Jan 4, 2023):
ArchiveBox actually already supports going to
https://archivebox.example.com/archive/https://example.com/some/original/urland it'll auto-redirect without needing to know the snapshot number, you can find the redirect logic here: https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/core/views.py#L143However, ArchiveBox doesn't offer proxy-replaying (aka seamless browsing with automatic redirecting to archived versions for every URL) directly from a browser.
pywb's proxy-archiving feature would be a better fit for that than ArchiveBox: https://pywb.readthedocs.io/en/develop/manual/configuring.html#http-s-proxy-mode