[GH-ISSUE #1180] Add ZSTD compression #772

Open
opened 2026-03-02 11:52:38 +03:00 by kerem · 6 comments
Owner

Originally created by @haappi on GitHub (Mar 31, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1180

Describe the feature you'd like

Having ZStandard (or any sort of compression) for assets / page archivals would help reduce the size of full website archivals.

Describe the benefits this would bring to existing Hoarder users

  • Reduced storage size for the long term hoarding.

Can the goal of this request already be achieved via other means?

Using a file system that has compression by default (btrfs?)

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

May use more cpu time along with memory. The .bin files can just be decompressed on the fly when viewing bookmarks. However, it wouldn't prove to be useful for assets, and likewise full page screenshots.

Through my short testing, I've noticed about a 20-30% saving in html assets using the default ZSTD compression settings

Originally created by @haappi on GitHub (Mar 31, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1180 ### Describe the feature you'd like Having ZStandard (or any sort of compression) for assets / page archivals would help reduce the size of full website archivals. ### Describe the benefits this would bring to existing Hoarder users - Reduced storage size for the long term hoarding. ### Can the goal of this request already be achieved via other means? Using a file system that has compression by default (btrfs?) ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context May use more cpu time along with memory. The .bin files can just be decompressed on the fly when viewing bookmarks. However, it wouldn't prove to be useful for assets, and likewise full page screenshots. Through my short testing, I've noticed about a 20-30% saving in html assets using the default ZSTD compression settings
Author
Owner

@MohamedBassem commented on GitHub (Mar 31, 2025):

That's actually a great idea. I'll try to include it in the next release.

<!-- gh-comment-id:2767277373 --> @MohamedBassem commented on GitHub (Mar 31, 2025): That's actually a great idea. I'll try to include it in the next release.
Author
Owner

@haappi commented on GitHub (Apr 1, 2025):

I might attempt giving this a shot myself, given I can figure out how to support gzip, lz4, zstd, and none without having to modify existing data.

<!-- gh-comment-id:2767687173 --> @haappi commented on GitHub (Apr 1, 2025): I might attempt giving this a shot myself, given I can figure out how to support gzip, lz4, zstd, and none without having to modify existing data.
Author
Owner

@haappi commented on GitHub (Apr 1, 2025):

I'm having trouble getting my code to compile due to issues with the bindings or libraries for lz4 & zstd. Would you mind taking a look if you have time? I’d also like seeing your implementation if you end up making your own!

<!-- gh-comment-id:2768171205 --> @haappi commented on GitHub (Apr 1, 2025): I'm having trouble getting my code to compile due to issues with the bindings or libraries for lz4 & zstd. Would you mind taking a look if you have time? I’d also like seeing your implementation if you end up making your own!
Author
Owner

@mratsim commented on GitHub (May 12, 2025):

I see that you are adding lz4 but is there an use-case for it? And I don't think there is any for gzip as it is outclassed for all intent and purposes by zstd.

Zstd is now very standard, and even the default compression for many Linux package managers and in kernels like Linux or FreeBSD and even in browsers.

One of the big benefits of lz4 is that it could detect non-compressible files and bail to avoid burning CPU cycles for nothing, however zstd added that a couple years ago (and using lz4 for detection, lz4/zstd were initially designed by the same person)

The other big benefit of lz4 is decompression speed, but Karakeep's purpose is long-term archival so I think zstd is good enough, constantly improving (and it has fast presets if we want to reclaim some performance)

<!-- gh-comment-id:2871542157 --> @mratsim commented on GitHub (May 12, 2025): I see that you are adding lz4 but is there an use-case for it? And I don't think there is any for gzip as it is outclassed for all intent and purposes by zstd. Zstd is now very standard, and even the default compression for many Linux package managers and in kernels like Linux or FreeBSD and even in browsers. One of the big benefits of lz4 is that it could detect non-compressible files and bail to avoid burning CPU cycles for nothing, however zstd added that a couple years ago (and using lz4 for detection, lz4/zstd were initially designed by the same person) The other big benefit of lz4 is decompression speed, but Karakeep's purpose is long-term archival so I think zstd is good enough, constantly improving (and it has `fast` presets if we want to reclaim some performance)
Author
Owner

@haappi commented on GitHub (May 13, 2025):

Thanks for the insight. I've paused work on this since, as discussed on Discord (I think?), we decided to wait for the next Node LTS, which will have native ZSTD support baked into the API.

The idea behind supporting multiple algorithms was mostly for flexibility, mainly for those on lower-end hardware who might not want to spend the extra CPU cycles on ZSTD.

Happy to revise this once LTS drops or if Mohamed wants to pick it up earlier.

<!-- gh-comment-id:2874796234 --> @haappi commented on GitHub (May 13, 2025): Thanks for the insight. I've paused work on this since, as discussed on Discord (_I think?_), we decided to wait for the next Node LTS, which will have native ZSTD support baked into the API. The idea behind supporting multiple algorithms was mostly for flexibility, mainly for those on lower-end hardware who might not want to spend the extra CPU cycles on ZSTD. Happy to revise this once LTS drops or if [Mohamed](https://github.com/MohamedBassem) wants to pick it up earlier.
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 28, 2025):

Btw meilisearch supports compression

<!-- gh-comment-id:2915517148 --> @thiswillbeyourgithub commented on GitHub (May 28, 2025): Btw meilisearch [supports compression](https://www.meilisearch.com/docs/reference/api/overview#content-encoding)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#772
No description provided.