[GH-ISSUE #353] Support Obelisk archiving #240

Open
opened 2026-02-25 23:33:46 +03:00 by kerem · 12 comments
Owner

Originally created by @fmartingr on GitHub (Feb 10, 2022).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/353

It seems that shiori depends on warc which is currently archived. We need to find a replacement for warc. Maybe obelisk?

Acceptance criteria

  • Add a migration that will define in which archiver type the content is (put warc for already existing rows, but obelisk as default)
  • Add logic to allow multiple archivers to be used, do not remove Warc logic, just refactor it.
  • Allow the /bookmark/:id/archive handler to load multiple archive types (to load old and new)
  • Allow the POST /api/v1/bookmarks/POST /api/v1/bookmarks/cache/POST /api/v1/bookmarks/:id/cache to select which archiver to use (but hardcode/default it to obelisk).
  • Add a documentation page describing the archivers, available options, pro-cons.
  • Determine if different extensions should be used from now on (leave current filename expectations intact)
  • All code logic should be properly tested
  • Swagger documentation should be updated
Originally created by @fmartingr on GitHub (Feb 10, 2022). Original GitHub issue: https://github.com/go-shiori/shiori/issues/353 It seems that shiori depends on [warc](https://github.com/go-shiori/warc) which is currently archived. We need to find a replacement for warc. Maybe [obelisk](https://github.com/go-shiori/obelisk)? ## Acceptance criteria - Add a migration that will define in which archiver type the content is (put `warc` for already existing rows, but `obelisk` as default) - Add logic to allow multiple archivers to be used, **do not** remove Warc logic, just refactor it. - Allow the `/bookmark/:id/archive` handler to load multiple archive types (to load old and new) - Allow the `POST /api/v1/bookmarks`/`POST /api/v1/bookmarks/cache`/`POST /api/v1/bookmarks/:id/cache` to select which archiver to use (but hardcode/default it to `obelisk`). - Add a documentation page describing the archivers, available options, pro-cons. - Determine if different extensions should be used from now on (leave current filename expectations **intact**) - All code logic should be properly tested - Swagger documentation should be updated
Author
Owner

@efrecon commented on GitHub (Feb 18, 2022):

obelisk is great, I have just tested the latest release on a few examples and it does a good job at preserving the original layout and content.

<!-- gh-comment-id:1044737903 --> @efrecon commented on GitHub (Feb 18, 2022): `obelisk` is great, I have just tested the latest release on a few examples and it does a good job at preserving the original layout and content.
Author
Owner

@fmartingr commented on GitHub (Feb 18, 2022):

I still haven't tested/checked it yet, but the other day I stumbled randomly with https://github.com/gildas-lormeau/SingleFile and it also seemed quite good (and having just a single HTML as output it's quite useful as well).

<!-- gh-comment-id:1045044039 --> @fmartingr commented on GitHub (Feb 18, 2022): I still haven't tested/checked it yet, but the other day I stumbled randomly with https://github.com/gildas-lormeau/SingleFile and it also seemed quite good (and having just a single HTML as output it's quite useful as well).
Author
Owner

@grawlinson commented on GitHub (Feb 19, 2022):

I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points.

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

<!-- gh-comment-id:1045552575 --> @grawlinson commented on GitHub (Feb 19, 2022): I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points. EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.
Author
Owner

@fmartingr commented on GitHub (Feb 19, 2022):

I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points.

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

Just to clarify (because I didn't express myself very well): I like how SingleFile works (the single HTML file output) but I do not plan to replace warc with it. The plan still is to go for Obelisk. :)

Edit: Yeah, when I made my first comment I didn't realise that Obelisk's output is also a Single HTML file 😅

<!-- gh-comment-id:1045911559 --> @fmartingr commented on GitHub (Feb 19, 2022): > I'm in the process of packaging shiori for the AUR, and I strongly recommend staying within the Go ecosystem (obelisk can be imported as a go module!) as relying on external tools (e.g. SingleFile) defeats one of shiori's major selling points. > > EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself. Just to clarify (because I didn't express myself very well): I like how SingleFile works (the single HTML file output) but I do not plan to replace _warc_ with it. The plan still is to go for Obelisk. :) Edit: Yeah, when I made my first comment I didn't realise that Obelisk's output is also a Single HTML file :sweat_smile:
Author
Owner

@grawlinson commented on GitHub (Feb 19, 2022):

Thanks for clarifying that!

A package is now available on the AUR, so if there are any bug reports relating to Arch Linux, tag me and I'll attempt to help out.

<!-- gh-comment-id:1045953233 --> @grawlinson commented on GitHub (Feb 19, 2022): Thanks for clarifying that! A package is now available on the [AUR](https://aur.archlinux.org/packages/shiori), so if there are any bug reports relating to Arch Linux, tag me and I'll attempt to help out.
Author
Owner

@gildas-lormeau commented on GitHub (Oct 13, 2022):

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck!

<!-- gh-comment-id:1277988291 --> @gildas-lormeau commented on GitHub (Oct 13, 2022): > EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself. For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck!
Author
Owner

@fmartingr commented on GitHub (Oct 14, 2022):

EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself.

For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck!

Thanks for the clarification, and even if I love SingleFile (I has helped me a ton while moving out to a new flat!), it would add unnecessary complexity for us. So far, obelisk seems to provide the expected results, and we could use this migration to move that project further in the go world :)

<!-- gh-comment-id:1278851940 --> @fmartingr commented on GitHub (Oct 14, 2022): > > EDIT: Additionally, SingleFile requires a browser binary to be present, which is a Pandora's box in itself. > > For the record, this statement is false. SingleFile can work with JSDOM. Anyway, good luck! Thanks for the clarification, and even if I love SingleFile (I has helped me a ton while moving out to a new flat!), it would add unnecessary complexity for us. So far, obelisk seems to provide the expected results, and we could use this migration to move that project further in the go world :)
Author
Owner

@gildas-lormeau commented on GitHub (Oct 14, 2022):

Thanks for the feedback! Personally, I think that in 2022, you have to use a web browser for this kind of tasks. Also, it's really becoming essential when it comes to determining what to really save. This is where SingleFile, generally, stands out. A very large part of the code consists in optimizing the size of the saved page. To do this, a browser is unfortunately required.

<!-- gh-comment-id:1278939296 --> @gildas-lormeau commented on GitHub (Oct 14, 2022): Thanks for the feedback! Personally, I think that in 2022, you have to use a web browser for this kind of tasks. Also, it's really becoming essential when it comes to determining what to *really* save. This is where SingleFile, generally, stands out. A very large part of the code consists in optimizing the size of the saved page. To do this, a browser is unfortunately required.
Author
Owner

@ivanrg99 commented on GitHub (Jan 18, 2024):

What's the status on this? One of the reasons why we choose to run software like Shiori is for archiving purposes, to prevent link-rot and preserve information/knowledge. Having our bookmarks stored in a binary data format as opposed to plain text hurts data preservation. Do you need any help with the transition to Obelisk? Is anyone working on this at the moment?

<!-- gh-comment-id:1898393323 --> @ivanrg99 commented on GitHub (Jan 18, 2024): What's the status on this? One of the reasons why we choose to run software like Shiori is for archiving purposes, to prevent link-rot and preserve information/knowledge. Having our bookmarks stored in a binary data format as opposed to plain text hurts data preservation. Do you need any help with the transition to Obelisk? Is anyone working on this at the moment?
Author
Owner

@Monirzadeh commented on GitHub (Jan 18, 2024):

Personally try to make it ready to use later. currently i work on https://github.com/go-shiori/obelisk/pull/96 and https://github.com/go-shiori/obelisk/pull/98
we have some open issue there too. you can work on any aspect that you like.

<!-- gh-comment-id:1898432777 --> @Monirzadeh commented on GitHub (Jan 18, 2024): Personally try to make it ready to use later. currently i work on https://github.com/go-shiori/obelisk/pull/96 and https://github.com/go-shiori/obelisk/pull/98 we have some [open issue](https://github.com/go-shiori/obelisk/issues) there too. you can work on any aspect that you like.
Author
Owner

@fmartingr commented on GitHub (Feb 4, 2024):

I need to sit down and pave the way for people to start implementing this features. I started a draft under #481 some time ago but didn't sat down again on that since there were other things that had priority like the API. I guess the API migration will get faster over time while we refactor the logic in different components, but that's still the main priority now.

For this to work, we will need to isolate the archiving logic in its own domain and provide backwards compatibility, which will require a migration adding a new column specifying which archive format a bookmark is currently in.

What I'm trying to say is that it can be done and on my radar, but is not trivial. Once 1.6 is released I need to sit down and work on the roadmap again, defining some issues that we need to work on several things and probably making some PRs to preprare for that to happen.

<!-- gh-comment-id:1925655452 --> @fmartingr commented on GitHub (Feb 4, 2024): I need to sit down and pave the way for people to start implementing this features. I started a draft under #481 some time ago but didn't sat down again on that since there were other things that had priority like the API. I guess the API migration will get faster over time while we refactor the logic in different components, but that's still the main priority now. For this to work, we will need to isolate the archiving logic in its own domain and provide backwards compatibility, which will require a migration adding a new column specifying which archive format a bookmark is currently in. What I'm trying to say is that it can be done and on my radar, but is not trivial. Once 1.6 is released I need to sit down and work on the roadmap again, defining some issues that we need to work on several things and probably making some PRs to preprare for that to happen.
Author
Owner

@dehlen commented on GitHub (Feb 26, 2024):

Hey,

I am eagerly awaiting the work on this issue. I would like to migrate my catalog of bookmarks saved in instapaper to shiori and self host this on my local network. However what is holding me back is that the current implementation stores the archived bookmark in a bolt database. I am now wondering whether I should wait for obelisk support in shiori or if it makes sense to migrate right away. I do not want to import all my bookmarks again whenever obelisk is added and am wondering how likely it is there will exist a migration path for previously archived bookmarks to be converted from the bolt db to an html output created by obelisk.

As I understand it is definitely on your radar but it's just something you didn't find time to look at yet. My comment shouldn't pressure you in any way it's more of a +1 for this feature and to be subscribed to the ongoing discussion. Whenever you have new information I am very keen to hear them regarding this issue :)

<!-- gh-comment-id:1964432807 --> @dehlen commented on GitHub (Feb 26, 2024): Hey, I am eagerly awaiting the work on this issue. I would like to migrate my catalog of bookmarks saved in instapaper to shiori and self host this on my local network. However what is holding me back is that the current implementation stores the archived bookmark in a bolt database. I am now wondering whether I should wait for obelisk support in shiori or if it makes sense to migrate right away. I do not want to import all my bookmarks again whenever obelisk is added and am wondering how likely it is there will exist a migration path for previously archived bookmarks to be converted from the bolt db to an html output created by obelisk. As I understand it is definitely on your radar but it's just something you didn't find time to look at yet. My comment shouldn't pressure you in any way it's more of a +1 for this feature and to be subscribed to the ongoing discussion. Whenever you have new information I am very keen to hear them regarding this issue :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shiori#240
No description provided.