[GH-ISSUE #846] Feature Request: Support saving local webpages or PDFs #524

Closed
opened 2026-03-01 14:44:18 +03:00 by kerem · 1 comment
Owner

Originally created by @Victor239 on GitHub (Sep 12, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/846

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

Sometimes ArchiveBox fails to archive a URL, but in those instances I'm still able to use the SingleFile browser extension or save the webpage as a PDF. If I open the SingleFile or PDF in my browser, I'd like ArchiveBox to be able to archive these webpages via the local file instead and then I can manually edit the URL that is associated with these webpages later if desired.

Also in some cases there are PDFs I've obtained from emails, or webpages which are already offline, but I'm unable to import them into ArchiveBox. This leaves me having to maintain a separate archive for these files in Zotero, which is a headache and makes me want to just pair things down to one archive program.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Instead of just HTTP URLs, ArchiveBox should also accept filenames such as file:///home/user/Downloads/Rupert.html and be able to archive these pages.

Originally created by @Victor239 on GitHub (Sep 12, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/846 ## Type - [ ] General question or discussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves Sometimes ArchiveBox fails to archive a URL, but in those instances I'm still able to use the SingleFile browser extension or save the webpage as a PDF. If I open the SingleFile or PDF in my browser, I'd like ArchiveBox to be able to archive these webpages via the local file instead and then I can manually edit the URL that is associated with these webpages later if desired. Also in some cases there are PDFs I've obtained from emails, or webpages which are already offline, but I'm unable to import them into ArchiveBox. This leaves me having to maintain a separate archive for these files in Zotero, which is a headache and makes me want to just pair things down to one archive program. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes Instead of just HTTP URLs, ArchiveBox should also accept filenames such as `file:///home/user/Downloads/Rupert.html` and be able to archive these pages.
kerem 2026-03-01 14:44:18 +03:00
Author
Owner

@pirate commented on GitHub (Sep 16, 2021):

Not possible given the security model. ArchiveBox is essentially running in a separate virtual machine and does not have access to the local filesystem or localhost:... urls. This is not something that's likely to change in the near/medium-term future. If it were to be implemented, it would be via a 3rd-party / community-contributed extension like https://github.com/ArchiveBox/ArchiveBox/issues/577 sending the files to ArchiveBox.

As a workaround, keep in mind you can always drop files directly into snapshot folders in ArchiveBox's data dir. If you make a new snapshot, let it fail during archiving, then drag some files into the folder manually it wont delete them, and they'll be considered part of that snapshot's outputs. e.g. some_attachment.pdf -> ~/archivebox/archive/152342345234/some_attachment.pdf.

<!-- gh-comment-id:920544407 --> @pirate commented on GitHub (Sep 16, 2021): Not possible given the security model. ArchiveBox is essentially running in a separate virtual machine and does not have access to the local filesystem or `localhost:...` urls. This is not something that's likely to change in the near/medium-term future. If it were to be implemented, it would be via a 3rd-party / community-contributed extension like https://github.com/ArchiveBox/ArchiveBox/issues/577 sending the files to ArchiveBox. As a workaround, keep in mind you can always drop files directly into snapshot folders in ArchiveBox's data dir. If you make a new snapshot, let it fail during archiving, then drag some files into the folder manually it wont delete them, and they'll be considered part of that snapshot's outputs. e.g. `some_attachment.pdf` -> `~/archivebox/archive/152342345234/some_attachment.pdf`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#524
No description provided.