[GH-ISSUE #752] Human friendly storage location #487

Open
opened 2026-03-02 11:50:18 +03:00 by kerem · 12 comments
Owner

Originally created by @dionorgua on GitHub (Dec 23, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/752

Describe the feature you'd like

Hi,

I understand that storage itself is supposed to be accessed by web UI or app, but it'll be cool to also have it human-friendly. Just to be able to browser/archive manually if needed.

Something like /data/<username>/<listname>/<domain>/<url>/filename.{pdf|html|etc}

Also it would be also cool to populate metadata.json with more data like URL, tags, etc. Even if it's redundant.

Describe the benefits this would bring to existing Hoarder users

  • Scripting
  • Ability to recover most of data in case of DB failure

Can the goal of this request already be achieved via other means?

no

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @dionorgua on GitHub (Dec 23, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/752 ### Describe the feature you'd like Hi, I understand that storage itself is supposed to be accessed by web UI or app, but it'll be cool to also have it human-friendly. Just to be able to browser/archive manually if needed. Something like `/data/<username>/<listname>/<domain>/<url>/filename.{pdf|html|etc}` Also it would be also cool to populate `metadata.json` with more data like URL, tags, etc. Even if it's redundant. ### Describe the benefits this would bring to existing Hoarder users - Scripting - Ability to recover most of data in case of DB failure ### Can the goal of this request already be achieved via other means? no ### Have you searched for an existing open/closed issue? - [X] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@kamtschatka commented on GitHub (Dec 23, 2024):

Your suggestion with the folder names is simply not possible since you can add a bookmark to multiple lists (unless you start doing everything with symlinks). Additionally not all characters in a URL are valid characters for folder names.

What are your scripting requirements in the first place? I feel like it would be better if you tell us your actual use case than how you want hoarder to be changed to make them possible for you

<!-- gh-comment-id:2559541192 --> @kamtschatka commented on GitHub (Dec 23, 2024): Your suggestion with the folder names is simply not possible since you can add a bookmark to multiple lists (unless you start doing everything with symlinks). Additionally not all characters in a URL are valid characters for folder names. What are your scripting requirements in the first place? I feel like it would be better if you tell us your actual use case than how you want hoarder to be changed to make them possible for you
Author
Owner

@MohamedBassem commented on GitHub (Dec 23, 2024):

Was writing the same comment as @kamtschatka. Also, not all assets are for links.

I get the point of more structure in the storage location (for example have the bookmark Id as one layer). But honestly, it's unlikely we'll be able to change this at this point.

<!-- gh-comment-id:2559544430 --> @MohamedBassem commented on GitHub (Dec 23, 2024): Was writing the same comment as @kamtschatka. Also, not all assets are for links. I get the point of more structure in the storage location (for example have the bookmark Id as one layer). But honestly, it's unlikely we'll be able to change this at this point.
Author
Owner

@dionorgua commented on GitHub (Dec 23, 2024):

Hi,

Yes. Sorry. I'm just newbie here. Installed it first time. I agree that for 'multiple lists' case list id is not possible.

But even without this having something more 'structured' would be excellent. Anything will be better than f59b4fbf-48de-48b1-bc26-d125dce19032/asset.bin. Even if it something like "URL Slug" that is completely ignored by app.

I found this and decided to create ticket after importing a few lists and checking ncdu output to find out why a few asset.bin files are more than 1 GB.

<!-- gh-comment-id:2560120365 --> @dionorgua commented on GitHub (Dec 23, 2024): Hi, Yes. Sorry. I'm just newbie here. Installed it first time. I agree that for 'multiple lists' case list id is not possible. But even without this having something more 'structured' would be excellent. Anything will be better than `f59b4fbf-48de-48b1-bc26-d125dce19032/asset.bin`. Even if it something like "URL Slug" that is completely ignored by app. I found this and decided to create ticket after importing a few lists and checking `ncdu` output to find out why a few `asset.bin` files are more than 1 GB.
Author
Owner

@MohamedBassem commented on GitHub (Dec 23, 2024):

@dionorgua I think to address this particular usecase, we can probably have a UI that lists all the assets and their sizes to debug large assets. I think that's a reasonable thing to have.

<!-- gh-comment-id:2560122936 --> @MohamedBassem commented on GitHub (Dec 23, 2024): @dionorgua I think to address this particular usecase, we can probably have a UI that lists all the assets and their sizes to debug large assets. I think that's a reasonable thing to have.
Author
Owner

@bverkron commented on GitHub (Dec 28, 2024):

Probably different than OPs use cases but one use case I can think of is being able to read the movie files in something like Jellyfin. Pinchflat has been great for saving videos from YouTube and feeding them to Jellyfin for easy viewing (especially in a row) but it lacks features like notes and tagging like Hoader has. I’d like to use Hoarder as a single tool for archiving web content (instead of using Pinchflat for video and Linkwarden for pages for example) but that kind of removes the ability to view the videos in Jellyfin.

<!-- gh-comment-id:2564157262 --> @bverkron commented on GitHub (Dec 28, 2024): Probably different than OPs use cases but one use case I can think of is being able to read the movie files in something like Jellyfin. Pinchflat has been great for saving videos from YouTube and feeding them to Jellyfin for easy viewing (especially in a row) but it lacks features like notes and tagging like Hoader has. I’d like to use Hoarder as a single tool for archiving web content (instead of using Pinchflat for video and Linkwarden for pages for example) but that kind of removes the ability to view the videos in Jellyfin.
Author
Owner

@Geoff-eg commented on GitHub (Feb 28, 2025):

Pretty new to all of this, but would having the assets saves as a .md file work? The .md could be formatted to include a section of lists, tags, content (whether url or text). For images an .md could be made with a reference to its actual location in a secondary folder where the .jpeg/.png etc. is actually stored?

Thinking of how I understand Obsidian handles this

<!-- gh-comment-id:2689388227 --> @Geoff-eg commented on GitHub (Feb 28, 2025): Pretty new to all of this, but would having the assets saves as a .md file work? The .md could be formatted to include a section of lists, tags, content (whether url or text). For images an .md could be made with a reference to its actual location in a secondary folder where the .jpeg/.png etc. is actually stored? Thinking of how I understand Obsidian handles this
Author
Owner

@kurkmeister commented on GitHub (Jun 22, 2025):

Perhaps an option to save all of the files in the asses folder as separate files, instead of as a .bin file?

That way, the files are easier to access if the KaraKeep instance is offline, and also makes it easier to use as an archive tool, as all of the original data can be accessed without opening through KaraKeep.

So, a video would have the video.mp4, a website the screenshot.png and the full_page.html as separate files inside the assets folder.

<!-- gh-comment-id:2994299997 --> @kurkmeister commented on GitHub (Jun 22, 2025): Perhaps an option to save all of the files in the asses folder as separate files, instead of as a `.bin` file? That way, the files are easier to access if the KaraKeep instance is offline, and also makes it easier to use as an archive tool, as all of the original data can be accessed without opening through KaraKeep. So, a video would have the `video.mp4`, a website the `screenshot.png` and the `full_page.html` as separate files inside the assets folder.
Author
Owner

@bverkron commented on GitHub (Jun 22, 2025):

Alternatively a CLI command to extract the contents of the .bin files so we can at least migrate the data if needed one day.

<!-- gh-comment-id:2994324608 --> @bverkron commented on GitHub (Jun 22, 2025): Alternatively a CLI command to extract the contents of the .bin files so we can at least migrate the data if needed one day.
Author
Owner

@MohamedBassem commented on GitHub (Jun 22, 2025):

Folks, the .bin files is actually just the data itself. If you just change the extension from .bin to .mp4 for example for videos, it'll work just fine.

<!-- gh-comment-id:2994326386 --> @MohamedBassem commented on GitHub (Jun 22, 2025): Folks, the `.bin` files is actually just the data itself. If you just change the extension from `.bin` to `.mp4` for example for videos, it'll work just fine.
Author
Owner

@bverkron commented on GitHub (Jun 22, 2025):

Is this the case for all bin files and is there an indication somewhere of what the original extension was? In the DB I imagine but that’s not super scripting friendly.

This also beg the question of … what is the purpose of just changing the file extension? Genuinely curious about the benefits.

<!-- gh-comment-id:2994418653 --> @bverkron commented on GitHub (Jun 22, 2025): Is this the case for all bin files and is there an indication somewhere of what the original extension was? In the DB I imagine but that’s not super scripting friendly. This also beg the question of … what is the purpose of just changing the file extension? Genuinely curious about the benefits.
Author
Owner

@kurkmeister commented on GitHub (Jun 22, 2025):

The metadata contains the file type
On Sun, Jun 22, 2025 at 22:05 bverkron @.***> wrote:

bverkron left a comment (karakeep-app/karakeep#752)
https://github.com/karakeep-app/karakeep/issues/752#issuecomment-2994418653

Is this the case for all bin files and is there an indication somewhere of
what the original extension was?

This also beg the question of … what is the purpose of just changing the
file extension? Genuinely curious about the benefits.

Reply to this email directly, view it on GitHub
https://github.com/karakeep-app/karakeep/issues/752#issuecomment-2994418653,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIQKYCCHWL6N2CBYJ52UYWL3E4D73AVCNFSM6AAAAABUC5JKLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOJUGQYTQNRVGM
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:2994419616 --> @kurkmeister commented on GitHub (Jun 22, 2025): The metadata contains the file type On Sun, Jun 22, 2025 at 22:05 bverkron ***@***.***> wrote: > *bverkron* left a comment (karakeep-app/karakeep#752) > <https://github.com/karakeep-app/karakeep/issues/752#issuecomment-2994418653> > > Is this the case for all bin files and is there an indication somewhere of > what the original extension was? > > This also beg the question of … what is the purpose of just changing the > file extension? Genuinely curious about the benefits. > > > Reply to this email directly, view it on GitHub > <https://github.com/karakeep-app/karakeep/issues/752#issuecomment-2994418653>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AIQKYCCHWL6N2CBYJ52UYWL3E4D73AVCNFSM6AAAAABUC5JKLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOJUGQYTQNRVGM> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@MohamedBassem commented on GitHub (Jun 22, 2025):

what is the purpose of just changing the file extension

It allows me to serve the asset directly given its id, without having to figure out what its filename is. Whether this is worth it or not, is debatable, but that's how I designed it back then. If the file names are not the same, I'll first need to read the metadata file to get the file name then serve the asset.

In a hindsight, I usually have to read the metadata file anyways to get the content type during serving. Is it possible to revisit that design today, maybe. Is it worth the effort, honestly, I doubt it.

<!-- gh-comment-id:2994424550 --> @MohamedBassem commented on GitHub (Jun 22, 2025): > what is the purpose of just changing the file extension It allows me to serve the asset directly given its id, without having to figure out what its filename is. Whether this is worth it or not, is debatable, but that's how I designed it back then. If the file names are not the same, I'll first need to read the metadata file to get the file name then serve the asset. In a hindsight, I usually have to read the metadata file anyways to get the content type during serving. Is it possible to revisit that design today, maybe. Is it worth the effort, honestly, I doubt it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#487
No description provided.