[GH-ISSUE #785] Add Local Webpage Upload Functionality for Authenticated and Restricted Content #516

Open
opened 2026-03-02 11:50:30 +03:00 by kerem · 5 comments
Owner

Originally created by @longzanxi on GitHub (Dec 29, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/785

Describe the feature you'd like

I would like to request the addition of a feature that allows users to upload locally saved webpages to Hoarder, including content that is not publicly accessible on the internet. This functionality would enable users to archive webpages that require authentication (e.g., Twitter, Rabbit, or forums) or have restricted access, ensuring that the saved content is an exact replica of what the user sees in their browser. This feature would be particularly useful for archiving posts, threads, or pages that are only visible after logging in or have specific permissions.

Describe the benefits this would bring to existing Hoarder users

  1. Access to Restricted Content: Users can archive content that is not publicly accessible, such as private tweets, forum posts, or subscription-based articles.
  2. Bypass Authentication Barriers: By saving webpages locally after logging in, users can avoid the need to configure tokens or cookies for server-side scraping.
  3. Preserve User-Specific Views: The saved webpages will reflect the exact content visible to the user, including personalized or dynamically loaded elements.
  4. Enhanced Archiving Capabilities: This feature expands Hoarder's utility for archiving content from platforms like Twitter, Rabbit, and other forums where authentication is required.
  5. Improved Compatibility with Browser Tools: Users can leverage browser-based tools (e.g., AI translators, ad blockers) during the archiving process, ensuring a seamless experience.

Can the goal of this request already be achieved via other means?

No. While users can manually save webpages and upload them to a server, this process is not integrated into Hoarder's workflow and lacks the convenience and automation that this feature would provide.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

This feature would complement Hoarder's existing capabilities, making it a more versatile and user-friendly archiving tool. It aligns with the project's goal of simplifying web archiving while leveraging modern browser technologies. For inspiration, you can refer to the functionality provided by Ray-D-Song/web-archive, which allows users to save and upload webpages directly from their browsers. This feature would be particularly beneficial for archiving content from platforms like Twitter, Rabbit, and forums, where authentication or permissions are required to access specific content.

Originally created by @longzanxi on GitHub (Dec 29, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/785 ### Describe the feature you'd like I would like to request the addition of a feature that allows users to upload locally saved webpages to Hoarder, including content that is not publicly accessible on the internet. This functionality would enable users to archive webpages that require authentication (e.g., Twitter, Rabbit, or forums) or have restricted access, ensuring that the saved content is an exact replica of what the user sees in their browser. This feature would be particularly useful for archiving posts, threads, or pages that are only visible after logging in or have specific permissions. ### Describe the benefits this would bring to existing Hoarder users 1. **Access to Restricted Content**: Users can archive content that is not publicly accessible, such as private tweets, forum posts, or subscription-based articles. 2. **Bypass Authentication Barriers**: By saving webpages locally after logging in, users can avoid the need to configure tokens or cookies for server-side scraping. 3. **Preserve User-Specific Views**: The saved webpages will reflect the exact content visible to the user, including personalized or dynamically loaded elements. 4. **Enhanced Archiving Capabilities**: This feature expands Hoarder's utility for archiving content from platforms like Twitter, Rabbit, and other forums where authentication is required. 5. **Improved Compatibility with Browser Tools**: Users can leverage browser-based tools (e.g., AI translators, ad blockers) during the archiving process, ensuring a seamless experience. ### Can the goal of this request already be achieved via other means? No. While users can manually save webpages and upload them to a server, this process is not integrated into Hoarder's workflow and lacks the convenience and automation that this feature would provide. ### Have you searched for an existing open/closed issue? - [X] I have searched for existing issues and none cover my fundamental request ### Additional context This feature would complement Hoarder's existing capabilities, making it a more versatile and user-friendly archiving tool. It aligns with the project's goal of simplifying web archiving while leveraging modern browser technologies. For inspiration, you can refer to the functionality provided by [Ray-D-Song/web-archive](https://github.com/Ray-D-Song/web-archive), which allows users to save and upload webpages directly from their browsers. This feature would be particularly beneficial for archiving content from platforms like Twitter, Rabbit, and forums, where authentication or permissions are required to access specific content.
Author
Owner

@MohamedBassem commented on GitHub (Dec 29, 2024):

Thanks for the detailed feature request. We're planning to support this natively in the extension itself in (https://github.com/hoarder-app/hoarder/issues/172). If it's implemented in the extension, would you still want to manually upload archives?

<!-- gh-comment-id:2564667138 --> @MohamedBassem commented on GitHub (Dec 29, 2024): Thanks for the detailed feature request. We're planning to support this natively in the extension itself in (https://github.com/hoarder-app/hoarder/issues/172). If it's implemented in the extension, would you still want to manually upload archives?
Author
Owner

@gulikoza commented on GitHub (Jan 4, 2025):

I have a bunch of stuff saved in mhtml files retrieved over the years with the browser's Save As function.
MHTML has metadata at the top that can be used, all the content and could possibly be easily rendered.
Importing this, maybe even through a script on the server, would allow me to consolidate into a single archive.

<!-- gh-comment-id:2570492559 --> @gulikoza commented on GitHub (Jan 4, 2025): I have a bunch of stuff saved in mhtml files retrieved over the years with the browser's Save As function. MHTML has metadata at the top that can be used, all the content and could possibly be easily rendered. Importing this, maybe even through a script on the server, would allow me to consolidate into a single archive.
Author
Owner

@maelp commented on GitHub (Mar 23, 2025):

The SingleFile extension with Hoarder backend is a first step, but another possibility would be that we could share chrome cookies through the Hoarder extension (or in some other way? or have a way to "login" to some websites with the puppeteer hosted browser?)

<!-- gh-comment-id:2746204295 --> @maelp commented on GitHub (Mar 23, 2025): The SingleFile extension with Hoarder backend is a first step, but another possibility would be that we could share chrome cookies through the Hoarder extension (or in some other way? or have a way to "login" to some websites with the puppeteer hosted browser?)
Author
Owner

@AlejandroAkbal commented on GitHub (May 25, 2025):

Yeah, ideally it should work like the Wallabag browser extension: it uploads a copy of the page you're browsing, so auth is already there

<!-- gh-comment-id:2907681137 --> @AlejandroAkbal commented on GitHub (May 25, 2025): Yeah, ideally it should work like the Wallabag browser extension: it uploads a copy of the page you're browsing, so auth is already there
Author
Owner

@maelp commented on GitHub (May 25, 2025):

This works, except for large files like videos, where you don’t want to
keep your browser open all while it downloads

Message ID: @.***>

<!-- gh-comment-id:2907705589 --> @maelp commented on GitHub (May 25, 2025): This works, except for large files like videos, where you don’t want to keep your browser open all while it downloads Message ID: ***@***.***>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#516
No description provided.