[GH-ISSUE #162] Feature Request: Serve Memento RFC7089-compliant headers during replay / viewing of Snapshot content #3133

Open
opened 2026-03-14 21:11:27 +03:00 by kerem · 7 comments
Owner

Originally created by @machawk1 on GitHub (Mar 7, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/162

Type:

  • General Question or Disussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves
Allows captures to be interoperable, negotiation of captures in time.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Captures of a web page in my archive give no indication that they are archival captures.

The Memento Framework (RFC7089) provides the semantics and syntax to express this. For example, from https://archive.sweeting.me/ I click the HTML icon for "smacke/subsync" and am directed to https://archive.sweeting.me/archive/1551268823/output.html. What is the URI for which this is a capture? This information is not expressed by the capture.

Viewing a similar capture for the same URI (which I deduced was https://github.com/smacke/subsync based on being able to identify the page as a GitHub page) from Internet Archive at http://web.archive.org/web/20190227193621/https://github.com/smacke/subsync, one can see (via curl or viewing the headers in DevTools) that the page gives context to what it represents.

Memento-Datetime: Wed, 27 Feb 2019 19:36:21 GMT

Is the temporal signifier of the capture (when it was archived)

Link: https://github.com/smacke/subsync; rel="original",...

Shows that the original URI on the live Web before it was archived was https://github.com/smacke/subsync.

You may notice a few different archive also support Memento in this way. There is more to it with respect to doing temporal content negotiation but a start would be for a capture to indicate it is a capture of URI R at time t.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I cant live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventualy
  • I'm willing to contribute to development

/cc @phonedude

Originally created by @machawk1 on GitHub (Mar 7, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/162 Type: - [ ] General Question or Disussion - [X] Propose a brand new feature - [X] Request modification of existing behavior or design **What is the problem that your feature request solves** Allows captures to be interoperable, negotiation of captures in time. **Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes** Captures of a web page in my archive give no indication that they are archival captures. The [Memento Framework (RFC7089)](https://tools.ietf.org/html/rfc7089) provides the semantics and syntax to express this. For example, from https://archive.sweeting.me/ I click the HTML icon for "smacke/subsync" and am directed to https://archive.sweeting.me/archive/1551268823/output.html. What is the URI for which this is a capture? This information is not expressed by the capture. Viewing a similar capture for the same URI (which I deduced was https://github.com/smacke/subsync based on being able to identify the page as a GitHub page) from Internet Archive at http://web.archive.org/web/20190227193621/https://github.com/smacke/subsync, one can see (via curl or viewing the headers in DevTools) that the page gives context to what it represents. > Memento-Datetime: Wed, 27 Feb 2019 19:36:21 GMT Is the temporal signifier of the capture (when it was archived) > Link: <https://github.com/smacke/subsync>; rel="original",... Shows that the original URI on the live Web before it was archived was https://github.com/smacke/subsync. You may notice a few different archive also support Memento in this way. There is more to it with respect to doing temporal content negotiation but a start would be for a capture to indicate it is a capture of URI _R_ at time _t_. **How badly do you want this new feature?** - [ ] It's an urgent deal-breaker, I cant live without it - [X] It's important to add it in the near-mid term future - [ ] It would be nice to have eventualy - [ ] I'm willing to contribute to development /cc @phonedude
Author
Owner

@pirate commented on GitHub (Mar 8, 2019):

To clarify, this metadata is all provided via HTTP headers right?

If so this is certainly trivial to serve with the django backend we're adding, but wont be doable with the static html archive export as we don't have control over what headers are served when accessing the raw html files.

<!-- gh-comment-id:471029382 --> @pirate commented on GitHub (Mar 8, 2019): To clarify, this metadata is all provided via HTTP headers right? If so this is certainly trivial to serve with the django backend we're adding, but wont be doable with the static html archive export as we don't have control over what headers are served when accessing the raw html files.
Author
Owner

@machawk1 commented on GitHub (Mar 8, 2019):

@pirate Yes, the temporal context of Memento-Datetime and Link: ...rel="original" is provided in the HTTP headers. Once downloaded and accessed beyond HTTP, this context is lost and out-of-the-scope of Memento if accessed from the local file system.

If so this is certainly trivial...

Great! Serving these headers with the respective captures would make them more usable.

<!-- gh-comment-id:471031384 --> @machawk1 commented on GitHub (Mar 8, 2019): @pirate Yes, the temporal context of `Memento-Datetime` and `Link: ...rel="original"` is provided in the HTTP headers. Once downloaded and accessed beyond HTTP, this context is lost and out-of-the-scope of Memento if accessed from the local file system. > If so this is certainly trivial... Great! Serving these headers with the respective captures would make them more usable.
Author
Owner

@pirate commented on GitHub (Mar 8, 2019):

ETA on that is going to be pretty far out, 6+ months if not more, since it depends on the new Django backend, which has its own backlog of features and refactors in the queue. I'll leave this ticket open to track progress though.

<!-- gh-comment-id:471033901 --> @pirate commented on GitHub (Mar 8, 2019): ETA on that is going to be pretty far out, 6+ months if not more, since it depends on the new Django backend, which has its own backlog of features and refactors in the queue. I'll leave this ticket open to track progress though.
Author
Owner

@ibnesayeed commented on GitHub (Mar 10, 2019):

While we are talking about Memento support, adding proper Memento TimeGate-style negotiation and providing Memento TimeMaps would be a plus, if it were to be aggregated.

<!-- gh-comment-id:471338575 --> @ibnesayeed commented on GitHub (Mar 10, 2019): While we are talking about Memento support, adding proper Memento TimeGate-style negotiation and providing Memento TimeMaps would be a plus, if it were to be aggregated.
Author
Owner

@LilaRest commented on GitHub (Dec 30, 2022):

Any news on that feature ? RFC7089-compliancy is IMO one of the most important features for a web archiving software. It would be awesome to see it implemented in the next versions of ArchiveBox 😊

<!-- gh-comment-id:1367876956 --> @LilaRest commented on GitHub (Dec 30, 2022): Any news on that feature ? RFC7089-compliancy is IMO one of the most important features for a web archiving software. It would be awesome to see it implemented in the next versions of ArchiveBox 😊
Author
Owner

@ntevenhere commented on GitHub (May 5, 2023):

Here is a list of a number services that are compatible with Memento Timegate and Timemap APIs, at least.: https://github.com/oduwsdl/MemGator/blob/master/docs/archives.json . Most of them are pywb instances.

And here are some possible pitfalls before correctly implementing Memento Support: https://groups.google.com/g/memento-dev/c/kHoxmO7R2NQ

<!-- gh-comment-id:1536656332 --> @ntevenhere commented on GitHub (May 5, 2023): Here is a list of a number services that are compatible with Memento Timegate and Timemap APIs, at least.: https://github.com/oduwsdl/MemGator/blob/master/docs/archives.json . Most of them are pywb instances. And here are some possible pitfalls before correctly implementing Memento Support: https://groups.google.com/g/memento-dev/c/kHoxmO7R2NQ
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

Status update on this: the Django backend I mentioned previously was implemented long ago, that's no longer a blocker, but I still haven't gotten around to investigating Memento further / building this myself.

I'm coming back from a hiatus and working on a large issue backlog right now, so tbh RFC compliance is still low on my list compared to some architectural refactors and the REST API (which a bunch of paying users need ASAP for integration with their in-house tooling).

If anyone wants to submit a PR for memento header support in the meantime I'm happy to review it!

<!-- gh-comment-id:1589298737 --> @pirate commented on GitHub (Jun 13, 2023): Status update on this: the Django backend I mentioned previously was implemented long ago, that's no longer a blocker, but I still haven't gotten around to investigating Memento further / building this myself. I'm coming back from a hiatus and working on a large issue backlog right now, so tbh RFC compliance is still low on my list compared to some architectural refactors and the REST API (which a bunch of paying users need ASAP for integration with their in-house tooling). If anyone wants to submit a PR for memento header support in the meantime I'm happy to review it!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3133
No description provided.