[GH-ISSUE #752] New Extractor Idea: Generate e-book files (epub/mobi) from archived article text #1983

Open
opened 2026-03-01 17:55:36 +03:00 by kerem · 6 comments
Owner

Originally created by @Valporaena on GitHub (May 21, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/752

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

I think it would be cool to have an integrated ebook-convert tool from Calibre (or something like it - this one is just the best one I know for that purpose) you can access from Archivebox web UI/command line. For a web UI I'm thinking something like: you tick the needed boxes, and where you now have "pull", "reset" and other options you would also have a "convert to e-book" option. Wanting to browse your articles comfortably on a dedicated e-ink/reader device seems like a common thing - would be neat to have an option to convert articles in bulk straight from your archive. And since Archivebox already archives pdf's, this option shouldn't seem out of place.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I like ArchiveBox so far / would recommend it to a friend
Originally created by @Valporaena on GitHub (May 21, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/752 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> ## Type - [ ] General question or discussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes I think it would be cool to have an integrated [ebook-convert](https://manual.calibre-ebook.com/generated/en/ebook-convert.html) tool from Calibre (or something like it - this one is just the best one I know for that purpose) you can access from Archivebox web UI/command line. For a web UI I'm thinking something like: you tick the needed boxes, and where you now have "pull", "reset" and other options you would also have a "convert to e-book" option. Wanting to browse your articles comfortably on a dedicated e-ink/reader device seems like a common thing - would be neat to have an option to convert articles in bulk straight from your archive. And since Archivebox already archives pdf's, this option shouldn't seem out of place. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually --- - [x] I like ArchiveBox so far / would recommend it to a friend
Author
Owner

@pirate commented on GitHub (May 21, 2021):

Related: https://github.com/ArchiveBox/ArchiveBox/issues/82

Have you tried reading the Readability or Mercury html/txt output on an e-reader? Both are designed for that kind of environment I think.

<!-- gh-comment-id:846125477 --> @pirate commented on GitHub (May 21, 2021): Related: https://github.com/ArchiveBox/ArchiveBox/issues/82 Have you tried reading the Readability or Mercury html/txt output on an e-reader? Both are designed for that kind of environment I think.
Author
Owner

@Valporaena commented on GitHub (May 21, 2021):

Yeah, I've tried that: you get an html file, which most e-book reader apps can't properly format. You would need to run that file through Calibre anyway, which is kind of a pain. Or you can make an ebook-convert script, I guess - but I'm completely illiterate when it comes to scripting and programming, unfortunately. :)

I've just recently looked at the Archivebox web UI and thought: wouldn't it be convenient if you could tick all these boxes and just get the files you need from here. Seems like some people thought it would be cool a few years ago too. I get that the project is about archiving first and foremost, but with my particular workflow this kind of thing would be ideal.

<!-- gh-comment-id:846237526 --> @Valporaena commented on GitHub (May 21, 2021): Yeah, I've tried that: you get an html file, which most e-book reader apps can't properly format. You would need to run that file through Calibre anyway, which is kind of a pain. Or you can make an ebook-convert script, I guess - but I'm completely illiterate when it comes to scripting and programming, unfortunately. :) I've just recently looked at the Archivebox web UI and thought: wouldn't it be convenient if you could tick all these boxes and just get the files you need from here. Seems like some people thought it would be cool a few years ago too. I get that the project is about archiving first and foremost, but with my particular workflow this kind of thing would be ideal.
Author
Owner

@Valporaena commented on GitHub (Jun 6, 2021):

Just want to give a quick update re: the importance of this particular feature.

Turns out at least some modern e-ink readers (Onyx Boox models in particular) have pretty good tools for reading raw html files in ebook form. They need some minor tweaking, but after that no conversion to epub/fb2 is needed, really. I don't know if this is true for Kindle, Kobo or other popular devices though.

<!-- gh-comment-id:855378011 --> @Valporaena commented on GitHub (Jun 6, 2021): Just want to give a quick update re: the importance of this particular feature. Turns out at least some modern e-ink readers (Onyx Boox models in particular) have pretty good tools for reading raw html files in ebook form. They need some minor tweaking, but after that no conversion to epub/fb2 is needed, really. I don't know if this is true for Kindle, Kobo or other popular devices though.
Author
Owner

@ralienpp commented on GitHub (Jun 3, 2024):

I also need this functionality, though I think that one could consider a slightly different approach. Instead of having ArchiveBox do the conversion, provide an option to define a callback function or some REST endpoint. Once the web page is saved, it will be fed into this external system.

This way the logic is decoupled from that of ArchiveBox and others can come up with their own conversion mechanisms for various purposes. I'd like to turn this into an equivalent of "Send to Kindle"; but I wouldn't want to burden ArchiveBox with the need to understand how the Amazon ecosystem works.

<!-- gh-comment-id:2146078393 --> @ralienpp commented on GitHub (Jun 3, 2024): I also need this functionality, though I think that one could consider a slightly different approach. Instead of having ArchiveBox do the conversion, provide an option to define a callback function or some REST endpoint. Once the web page is saved, it will be fed into this external system. This way the logic is decoupled from that of ArchiveBox and others can come up with their own conversion mechanisms for various purposes. I'd like to turn this into an equivalent of "Send to Kindle"; but I wouldn't want to burden ArchiveBox with the need to understand how the Amazon ecosystem works.
Author
Owner

@pirate commented on GitHub (Jun 4, 2024):

Callback webhooks are implemented in v0.8.0 (#1418) along with the new REST API :)

Should be enough to build this out externally with N8N/NodeRED/IFTTT/Zapier/etc.

<!-- gh-comment-id:2147362770 --> @pirate commented on GitHub (Jun 4, 2024): [Callback webhooks](https://github.com/ArchiveBox/ArchiveBox/pull/1418) are implemented in [v0.8.0](https://github.com/ArchiveBox/ArchiveBox/releases/tag/v0.8.0-rc) ([#1418](https://github.com/ArchiveBox/ArchiveBox/pull/1418)) along with the new REST API :) Should be enough to build this out externally with N8N/NodeRED/IFTTT/Zapier/etc. <img src="https://github.com/ArchiveBox/ArchiveBox/assets/511499/fe2ffecc-657d-4eb8-9a16-b3198e859f60" width="400px"/>
Author
Owner

@pirate commented on GitHub (Aug 13, 2024):

I also recommend ppl check out https://readeck.org/en/ if you're looking for a solution that offers e-pub export.

<!-- gh-comment-id:2285457666 --> @pirate commented on GitHub (Aug 13, 2024): I also recommend ppl check out https://readeck.org/en/ if you're looking for a solution that offers e-pub export.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1983
No description provided.