[GH-ISSUE #1306] Plugin to use archive.is automatically #834

Open
opened 2026-03-02 11:53:07 +03:00 by kerem · 5 comments
Owner

Originally created by @maelp on GitHub (Apr 24, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1306

Describe the feature you'd like

Multiple websites have paywalls that can be circumvented by using "https://archive.is/"

this brings you to a page which either shows the last scraping, or provides you an opportunity to trigger a scraping

this could probably be turned into an "add archive.is archive" in Karakeep, which launches a background job when trying to retrieve an URL in order to also ask archive.is to do a copy, then retrieve that copy from there

perhaps for "long background jobs" like that it could be useful to use Inngest / DBOS / Temporal.io

Describe the benefits this would bring to existing Karakeep users

Add a full archive of the page, for websites where archive.is allows circumventing paywalls

Can the goal of this request already be achieved via other means?

Doing it manually

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @maelp on GitHub (Apr 24, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1306 ### Describe the feature you'd like Multiple websites have paywalls that can be circumvented by using "https://archive.is/<paste original URL here>" this brings you to a page which either shows the last scraping, or provides you an opportunity to trigger a scraping this could probably be turned into an "add archive.is archive" in Karakeep, which launches a background job when trying to retrieve an URL in order to also ask archive.is to do a copy, then retrieve that copy from there perhaps for "long background jobs" like that it could be useful to use Inngest / DBOS / Temporal.io ### Describe the benefits this would bring to existing Karakeep users Add a full archive of the page, for websites where archive.is allows circumventing paywalls ### Can the goal of this request already be achieved via other means? Doing it manually ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@maelp commented on GitHub (Apr 24, 2025):

Check an outline made by ChatGPT here https://chatgpt.com/share/6809d209-be58-800b-a764-7885ed79ba2d

<!-- gh-comment-id:2826465851 --> @maelp commented on GitHub (Apr 24, 2025): Check an outline made by ChatGPT here https://chatgpt.com/share/6809d209-be58-800b-a764-7885ed79ba2d
Author
Owner

@huchene commented on GitHub (Apr 25, 2025):

+1

<!-- gh-comment-id:2829259595 --> @huchene commented on GitHub (Apr 25, 2025): +1
Author
Owner

@Byrnesdigital commented on GitHub (May 13, 2025):

This would be probably be best achieved by integrating with something like Ladder or Marreta which are essentially self-hosted open source versions of archive.is. I haven't spent much time with either but I'm wondering if some sort of automation could be set up to pass URLs/data back and forth? Something like url sent to karakeep gets automatically sent to marreta then the rendered page gets sent back to karakeep and the bookmark is updated with the full text version.

<!-- gh-comment-id:2878201639 --> @Byrnesdigital commented on GitHub (May 13, 2025): This would be probably be best achieved by integrating with something like [Ladder](https://github.com/everywall/ladder) or [Marreta](https://github.com/manualdousuario/marreta) which are essentially self-hosted open source versions of archive.is. I haven't spent much time with either but I'm wondering if some sort of automation could be set up to pass URLs/data back and forth? Something like url sent to karakeep gets automatically sent to marreta then the rendered page gets sent back to karakeep and the bookmark is updated with the full text version.
Author
Owner

@maelp commented on GitHub (May 14, 2025):

I guess ideally we would have a common API for all of those, and the user
would choose his backend (archive.is or something self-hosted)

Message ID: @.***>

<!-- gh-comment-id:2878953590 --> @maelp commented on GitHub (May 14, 2025): I guess ideally we would have a common API for all of those, and the user would choose his backend (archive.is or something self-hosted) Message ID: ***@***.***>
Author
Owner

@Byrnesdigital commented on GitHub (May 14, 2025):

I guess ideally we would have a common API for all of those, and the user
would choose his backend (archive.is or something self-hosted)

Message ID: @.***>

Upon further reading it looks like archive.is supports Memento for an API. The docs look pretty outdated but it may be worth checking out rather than starting from scratch.

<!-- gh-comment-id:2880723955 --> @Byrnesdigital commented on GitHub (May 14, 2025): > I guess ideally we would have a common API for all of those, and the user > would choose his backend (archive.is or something self-hosted) > > Message ID: ***@***.***> > Upon further reading it looks like archive.is supports [Memento](http://mementoweb.org/depot/native/archiveis/) for an API. The docs look pretty outdated but it may be worth checking out rather than starting from scratch.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#834
No description provided.