mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1085] Enhancement: Use the same URL layout as Archive.org for viewing ArchiveBox Snapshots https://archive.org/web/<URL> #678
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#678
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pirate on GitHub (Jan 17, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1085
To visit an archived version of a website (or archive it automatically) on Archive.org, one can just visit
http://web.archive.org/web/https://example.com/and it will redirect tohttp://web.archive.org/web/20230116145642/https://example.com/(or whatever the most recent snapshot timestamp is).To really emobdy the tagline "ArchiveBox is a self-hosted version of archive.org" we should properly support their URL scheme too.
e.g.
https://demo.archivebox.io/web/https://example.comshould redirect to the most recent snapshothttps://demo.archivebox.io/web/20230116145642/https://example.com1673919713or the Archive.org-style20230116145642format and truncated forms2023,202301,20230116https://demo.archivebox.io/01ARZ3NDEKTSV4RRFFQ69G5FAV/...2023matches202301,20230116,20230116145642automatically, and01AN4Z07BYmatches01AN4Z07BY79KA1307SR9X4MV3automaticallyFull spec:
https://demo.archivebox.io/web/<SLUG>whereSLUGcan be:- an original URL, with or without scheme, e.g.
https://example.com/index.html, 'example.com/index.html' ➡️ redirect to most recent snapshot forhttps://demo.archivebox.io/web/20230116145642/https://example.com/index.html- an ArchiveBox snapshot UUID in
ulid/specformat01AN4Z07BY79KA1307SR9X4MV3/index.htmlor timestamp prefix01AN4Z07BY/index.html➡️ redirect to that exact snapshothttps://demo.archivebox.io/web/20230116145642/https://example.com/index.html- an ArchiveBox snapshot timestamp in
YYMMDDHHMMSS, shortened forms likeYYYYMM, or unix timestamp format e.g.20230116145642/index.htmlor202301161456/index.html,202301/index.html,1673919713/index.html➡️ redirect to most recent snapshot matching that prefixhttps://demo.archivebox.io/web/20230116145642/https://example.com/index.htmlSubtasks:
ulidfield + migration to coalesce old uuid and timestamp fields into new ulid format (+asserts all snapshot timestamps are valid and are between 1900 and 2100 AD) (done in v0.8.5)xxxx-xxxx-xxxxxxxformat, add ULID diagram in docs breaking it down into timestamp and randomness0,1,2,httto make prefix-matching faster and less error prone (avoids clashing with199x*/20**year,1*unix timestamp,01*ULIDs, orhttp(s?)URL slug prefixes)@ArrayBolt3 commented on GitHub (Nov 20, 2024):
At least one project interested in using ArchiveBox (Kicksecure) would also be interested in this functionality, or any functionality that allows turning a URL into an archived URL via a simple transformation (i.e., prepend
https://archivebox.example.org/whatever/goes/here/to a URL to get an archived URL). The use case for this is:https://example.com/my-pageishttps://archivebox.example.com/BN833Zor something like that, there's no easy way to convert a link to an ArchiveBox link. Thus adding the archive links requires running a large "archive job" that archives all unarchived links, then gets the corresponding URLs and mass-edits them into the Wiki. This is a pain.https://example.com/my-pageishttps://archivebox.example.com/web/https://example.com/my-page, no mass-editing is required. A MediaWiki plugin can be used to put a button after each link that offers an archived version of the webpage to the user. (This is what we already do with archive.org.)Worthy of note, the format doesn't have to be exactly like archive.org for this to work. If ArchiveBox supported the MementoWeb API similar to how archive.today does, we would end up turning
https://example.com/my-pageintohttps://archivebox.example.com/timegate/https://example.com/my-page, which works just as well.Is help wanted here? Depending on how suitable ArchiveBox is for Kicksecure's use case, this might be a feature we'd be willing to implement and work on upstreaming.
@pirate commented on GitHub (Nov 20, 2024):
This is actually already supported 😃 It's just not well documented yet. You can visit:
https://archivebox.example.com/archive/https://example.com/archived/urle.g.:https://demo.archivebox.io/archive/https://arstechnica.com/tech-policy/2024/10/the-internet-archive-and-its-916-billion-saved-webpages-are-back-online/
Note you can also put any identifier for a snapshot after
/archive/and it will redirect correctly, e.g.:https://demo.archivebox.io/archive/<snapshot_timestamp>https://demo.archivebox.io/archive/<snapshot_URL>https://demo.archivebox.io/archive/<snapshot_UUID>https://demo.archivebox.io/archive/<snapshot_ABID>(a new publicly sharable ID format added in >=v0.8.5 designed to make sharing snapshots between federated/distributed servers easier in future releases)The REST API and admin pages for editing snapshots also allow fetching by any identifier (in >=
v0.8.5):https://archivebox.phantasm.group/admin/core/snapshot/<snapshot UUID> or <timestamp> or <ABID>https://archivebox.phantasm.group/api/v1/core/snapshot/<snapshot UUID> or <timestamp> or <ABID>(using the URL is not supported for these yet because I don't think it's needed as much for admins/API users)
In all cases you can also provide just the first few characters of the identifier to do a prefix search for all matching snapshots, e.g. to see all snapshots for

https://arstechnica.com/*you can visit: https://demo.archivebox.io/archive/https://arstechnica.com/We use this feature extensively with several of our paying clients who have similar needs as what you describe.
It's not fully compatible with archive.org / memento, but I have plans to make it cross-comaptible with both in the future which is what this ticket is meant to track.
@ArrayBolt3 commented on GitHub (Nov 20, 2024):
Oh nice! Thanks for the info!