[PR #1178] [PoC] Pluggable Archiver Proof-of-Concept #1030

Open
opened 2026-02-25 23:36:15 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/go-shiori/shiori/pull/1178
Author: @honsiorovskyi
Created: 1/4/2026
Status: 🔄 Open

Base: masterHead: pluggable-archiver


📝 Commits (8)

📊 Changes

20 files changed (+529 additions, -125 deletions)

View changed files

📝 Makefile (+10 -0)
📝 internal/cmd/delete.go (+7 -3)
📝 internal/cmd/root.go (+6 -1)
internal/core/core.go (+0 -5)
📝 internal/core/download.go (+3 -1)
📝 internal/core/processing.go (+1 -22)
internal/domains/archiver.go (+0 -55)
internal/domains/archiver_builtin.go (+81 -0)
internal/domains/archiver_external.go (+276 -0)
📝 internal/domains/bookmarks.go (+2 -7)
📝 internal/domains/bookmarks_test.go (+16 -14)
📝 internal/http/handlers/bookmark.go (+2 -2)
📝 internal/http/handlers/bookmark_test.go (+27 -1)
internal/model/archiver.go (+75 -0)
📝 internal/model/bookmark.go (+1 -0)
📝 internal/model/domains.go (+4 -4)
📝 internal/model/main.go (+6 -0)
📝 internal/testutil/shiori.go (+1 -1)
📝 internal/webserver/handler-api-ext.go (+5 -4)
📝 internal/webserver/handler-api.go (+6 -5)

📄 Description

Hello Shiori team! 👋

First of all, thank you for this project, I absolutely love the idea and how lightweight it is.

However, in the modern world, some things can't be nice done purely in Go and stay lightweight.

One example is archiving some heavy pages where content is rendered by JS fetching data from some API (e.g. the very Github 🙈). Thus, it seems, there's no way to avoid using some kind of web engine, rendering the page, running the JS and then capturing properly rendered DOM.

Another one would be archiving something like YT videos, where you'd need to run the whole yt-dlp stack to get the actual content.

At the same time, the approach of using such 3rd party suites has a set of drawbacks:

  1. Size and complexity: if bundled with Shiori, they'd extremely blow up the project and essentially defy its purpose and the main advantage.
  2. Security: in case of using a web engine for archival, we literally run some unknown 3rd party JS code in it; even given strong sandboxing mechanisms of modern web engines, it still might be a sound idea to isolate it even further by running such processing inside more isolated environments like separate VMs or even physical machines.
  3. Performance: running such kind of processing on low-power hardware (like Raspberry Pi or Intel Atom) creates substantial challenges for such hardware; in this case it also might be sound to offload such processing to more power hardware and process it opportunistically in batches while keeping Shiori itself always up and running on low-power and low-consumption hardware.
  4. Storage: similar to the issue above, archiving heavy content requires a lot of storage which might not be feasible to connect to the hardware where Shiori is running.

Of course, one could live without such archival (or use another project altogether), but it became interesting to me to check what would take to implement an elegant-ish solution to make it possible to keep Shiori small, easy and lightweight yet make it optionally extensible with external plugins to enable integrations with external archivers.

Something like this:
image

The idea is to, alongside with the internal already existing archiver, have an external one — either completely replacing the built-in one if the user opts in, or working in parallel with it (thus, enabling both local and external archival). They both would implement a very simple internal interface, and the external archiver would just be exposing that interface over some kind of a transport (probably regular REST API would be the go to approach, but there are options here).

Then the external archiver / multiplexer service (a completely separate thing, not coupled to Shiori in any way) would be running on a separate machine (or on the same machine — wherever the user chooses to deploy it) and routing the archival requests to the backends it supports (and the user decided to configure).


This PR very simple proof-of-concept with the external archiver that doesn't implement any specific transport yet, but just calls an external binary (it was just the easiest way to implement and test it — I still think it makes more sense to use some kind of a network transport for that instead).

Please keep in mind that this is a PoC and I don't have intentions to merge it as is (hence, publishing it as a draft).

However, if you like the idea and would provide some feedback on it, I'd be glad to continue working in this direction and eventually come up with a PR that would implement the external archiver part (the one inside the Shiori box) fitting the project structure, aligned with the roadmap, nicely coded, properly documented and covered with tests 🤞 😃

Hope it all makes sense. I'll also try to explain some code decisions in the inline comments in the PR itself.

Cheers!


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/go-shiori/shiori/pull/1178 **Author:** [@honsiorovskyi](https://github.com/honsiorovskyi) **Created:** 1/4/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `pluggable-archiver` --- ### 📝 Commits (8) - [`9341def`](https://github.com/go-shiori/shiori/commit/9341def36c4c9943a868ed262f43aab21932794e) initial refactoring - [`961a62e`](https://github.com/go-shiori/shiori/commit/961a62e1924609716f7813d6d2098436f102d744) fix test - [`ae85422`](https://github.com/go-shiori/shiori/commit/ae8542202494e1abfbdaefbcc016c00279467464) cleanup - [`66a9f0e`](https://github.com/go-shiori/shiori/commit/66a9f0ea04fd564bdc8f73c5ca5a1f2281b8207a) move HasArchive to the ArchiverDomain - [`9f2216a`](https://github.com/go-shiori/shiori/commit/9f2216aae64b6f5149598a648ea4c8f30bf2cd34) external archiver PoC - [`f4f2254`](https://github.com/go-shiori/shiori/commit/f4f2254a668b22303c7653fba21265a2603fe6bd) fix external archiver - [`a8c4588`](https://github.com/go-shiori/shiori/commit/a8c45881b023348694a9fbb58b05e67c2c15e7bd) adapt delete command - [`839e236`](https://github.com/go-shiori/shiori/commit/839e236a2ae7be8448e318a00f5389c1cefb3ec2) fixes ### 📊 Changes **20 files changed** (+529 additions, -125 deletions) <details> <summary>View changed files</summary> 📝 `Makefile` (+10 -0) 📝 `internal/cmd/delete.go` (+7 -3) 📝 `internal/cmd/root.go` (+6 -1) ➖ `internal/core/core.go` (+0 -5) 📝 `internal/core/download.go` (+3 -1) 📝 `internal/core/processing.go` (+1 -22) ➖ `internal/domains/archiver.go` (+0 -55) ➕ `internal/domains/archiver_builtin.go` (+81 -0) ➕ `internal/domains/archiver_external.go` (+276 -0) 📝 `internal/domains/bookmarks.go` (+2 -7) 📝 `internal/domains/bookmarks_test.go` (+16 -14) 📝 `internal/http/handlers/bookmark.go` (+2 -2) 📝 `internal/http/handlers/bookmark_test.go` (+27 -1) ➕ `internal/model/archiver.go` (+75 -0) 📝 `internal/model/bookmark.go` (+1 -0) 📝 `internal/model/domains.go` (+4 -4) 📝 `internal/model/main.go` (+6 -0) 📝 `internal/testutil/shiori.go` (+1 -1) 📝 `internal/webserver/handler-api-ext.go` (+5 -4) 📝 `internal/webserver/handler-api.go` (+6 -5) </details> ### 📄 Description Hello Shiori team! :wave: First of all, thank you for this project, I absolutely love the idea and how lightweight it is. However, in the modern world, some things can't be ~nice~ done purely in Go and stay lightweight. One example is archiving some heavy pages where content is rendered by JS fetching data from some API (e.g. the very Github :see_no_evil:). Thus, it seems, there's no way to avoid using some kind of web engine, rendering the page, running the JS and then capturing properly rendered DOM. Another one would be archiving something like YT videos, where you'd need to run the whole yt-dlp stack to get the actual content. At the same time, the approach of using such 3rd party suites has a set of drawbacks: 1. Size and complexity: if bundled with Shiori, they'd extremely blow up the project and essentially defy its purpose and the main advantage. 2. Security: in case of using a web engine for archival, we literally run some unknown 3rd party JS code in it; even given strong sandboxing mechanisms of modern web engines, it still might be a sound idea to isolate it even further by running such processing inside more isolated environments like separate VMs or even physical machines. 3. Performance: running such kind of processing on low-power hardware (like Raspberry Pi or Intel Atom) creates substantial challenges for such hardware; in this case it also might be sound to offload such processing to more power hardware and process it opportunistically in batches while keeping Shiori itself always up and running on low-power and low-consumption hardware. 4. Storage: similar to the issue above, archiving heavy content requires a lot of storage which might not be feasible to connect to the hardware where Shiori is running. --- Of course, one could live without such archival (or use another project altogether), but it became interesting to me to check what would take to implement an elegant-ish solution to make it possible to keep Shiori small, easy and lightweight yet make it optionally extensible with external plugins to enable integrations with external archivers. Something like this: <img width="4861" height="1831" alt="image" src="https://github.com/user-attachments/assets/0366e5ff-e28e-4c50-9f0e-65e7a3fdd8ef" /> The idea is to, alongside with the internal already existing archiver, have an external one — either completely replacing the built-in one if the user opts in, or working in parallel with it (thus, enabling both local and external archival). They both would implement a very simple internal interface, and the external archiver would just be exposing that interface over some kind of a transport (probably regular REST API would be the go to approach, but there are options here). Then the external archiver / multiplexer service (a completely separate thing, not coupled to Shiori in any way) would be running on a separate machine (or on the same machine — wherever the user chooses to deploy it) and routing the archival requests to the backends it supports (and the user decided to configure). --- This PR very simple proof-of-concept with the external archiver that doesn't implement any specific transport yet, but just calls an external binary (it was just the easiest way to implement and test it — I still think it makes more sense to use some kind of a network transport for that instead). _Please keep in mind that this is a PoC and I don't have intentions to merge it as is (hence, publishing it as a draft)._ However, if you like the idea and would provide some feedback on it, I'd be glad to continue working in this direction and eventually come up with a PR that would implement the external archiver part (the one inside the Shiori box) fitting the project structure, aligned with the roadmap, nicely coded, properly documented and covered with tests :crossed_fingers: :smiley: Hope it all makes sense. I'll also try to explain some code decisions in the inline comments in the PR itself. Cheers! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shiori#1030
No description provided.