mirror of
https://github.com/eduardolat/pgbackweb.git
synced 2026-04-25 05:35:57 +03:00
[GH-ISSUE #164] Use modern compression algorithm #131
Labels
No labels
bug
confirmed next step
duplicate
enhancement
help wanted
in progress
in progress
pull-request
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/pgbackweb#131
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jinnatar on GitHub (Dec 26, 2025).
Original GitHub issue: https://github.com/eduardolat/pgbackweb/issues/164
In short: Please consider switching to using
xzarchives which uses thelzmacompression algorithm. By my tests this can reduce total backup size by 30%.At current dumps are stored as zip archives. The old and venerable zip primarily uses the
deflatecompression algorithm from 1990. Zips in theory may optionally supportlzmabut most tooling for it does not, including Info-ZIP (which most Linux distros use) which was last updated in 2008. There is support forbzip2in Info-ZIP and while it's better thandeflate, it's only slightly better.The solution I'm proposing is to switch to a better container with better algorithm support. By my quick tests
xzis the winner as it uses thelzmaalgorithm with a robust and modern container format. For comparison I've also included the legacy lzma container format below1. There's potentially further gains to be had by increasing the compression factor from the default 6 up to 7..9 but that increases the memory requirements. A 7 might be a good compromise.Sample files, first is dump as stored by pgbackweb, then the same deflated. The rest are different ways of compressing the same dump:
The file types of the same tests:
While technically legacy lzma is the smallest. it's purely by having a smaller header than the modern xz by a couple of bytes. xz is superior in all other ways. ↩︎
@jinnatar commented on GitHub (Dec 29, 2025):
Tests with a much larger database reveal some of the considerations. Performing a single-threaded level 6 compression on a 2.8G dump took almost 19 minutes. Doing the same with 42 threads with hyperthreaded Xeon cores instead takes only 53 seconds. I'd imagine a setting for thread count would be required since it will be unique to every admin how many cores they can spare and how that aligns with the cron schedule.
Size comparison: