mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #162] Feature Request: Serve Memento RFC7089-compliant headers during replay / viewing of Snapshot content #1622
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1622
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @machawk1 on GitHub (Mar 7, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/162
Type:
What is the problem that your feature request solves
Allows captures to be interoperable, negotiation of captures in time.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Captures of a web page in my archive give no indication that they are archival captures.
The Memento Framework (RFC7089) provides the semantics and syntax to express this. For example, from https://archive.sweeting.me/ I click the HTML icon for "smacke/subsync" and am directed to https://archive.sweeting.me/archive/1551268823/output.html. What is the URI for which this is a capture? This information is not expressed by the capture.
Viewing a similar capture for the same URI (which I deduced was https://github.com/smacke/subsync based on being able to identify the page as a GitHub page) from Internet Archive at http://web.archive.org/web/20190227193621/https://github.com/smacke/subsync, one can see (via curl or viewing the headers in DevTools) that the page gives context to what it represents.
Is the temporal signifier of the capture (when it was archived)
Shows that the original URI on the live Web before it was archived was https://github.com/smacke/subsync.
You may notice a few different archive also support Memento in this way. There is more to it with respect to doing temporal content negotiation but a start would be for a capture to indicate it is a capture of URI R at time t.
How badly do you want this new feature?
/cc @phonedude
@pirate commented on GitHub (Mar 8, 2019):
To clarify, this metadata is all provided via HTTP headers right?
If so this is certainly trivial to serve with the django backend we're adding, but wont be doable with the static html archive export as we don't have control over what headers are served when accessing the raw html files.
@machawk1 commented on GitHub (Mar 8, 2019):
@pirate Yes, the temporal context of
Memento-DatetimeandLink: ...rel="original"is provided in the HTTP headers. Once downloaded and accessed beyond HTTP, this context is lost and out-of-the-scope of Memento if accessed from the local file system.Great! Serving these headers with the respective captures would make them more usable.
@pirate commented on GitHub (Mar 8, 2019):
ETA on that is going to be pretty far out, 6+ months if not more, since it depends on the new Django backend, which has its own backlog of features and refactors in the queue. I'll leave this ticket open to track progress though.
@ibnesayeed commented on GitHub (Mar 10, 2019):
While we are talking about Memento support, adding proper Memento TimeGate-style negotiation and providing Memento TimeMaps would be a plus, if it were to be aggregated.
@LilaRest commented on GitHub (Dec 30, 2022):
Any news on that feature ? RFC7089-compliancy is IMO one of the most important features for a web archiving software. It would be awesome to see it implemented in the next versions of ArchiveBox 😊
@ntevenhere commented on GitHub (May 5, 2023):
Here is a list of a number services that are compatible with Memento Timegate and Timemap APIs, at least.: https://github.com/oduwsdl/MemGator/blob/master/docs/archives.json . Most of them are pywb instances.
And here are some possible pitfalls before correctly implementing Memento Support: https://groups.google.com/g/memento-dev/c/kHoxmO7R2NQ
@pirate commented on GitHub (Jun 13, 2023):
Status update on this: the Django backend I mentioned previously was implemented long ago, that's no longer a blocker, but I still haven't gotten around to investigating Memento further / building this myself.
I'm coming back from a hiatus and working on a large issue backlog right now, so tbh RFC compliance is still low on my list compared to some architectural refactors and the REST API (which a bunch of paying users need ASAP for integration with their in-house tooling).
If anyone wants to submit a PR for memento header support in the meantime I'm happy to review it!