mirror of
https://github.com/go-shiori/shiori.git
synced 2026-04-25 06:25:54 +03:00
[GH-ISSUE #1108] Memory leak when fetching specific page #462
Labels
No labels
component:backend
component:builds
component:builds
component:extension
component:frontend
component:readability
database
database:mysql
database:postgres
database:sqlite
feature:ebooks
github_actions
good first issue
hacktoberfest
note:duplicate?
note:fixed?
note:out-of-scope?
os:windows
priority:high
priority:low
pull-request
resolution:as-intended
resolution:cant-reproduce
resolution:duplicate
resolution:fixed
resolution:wontfix
tag:TBD
tag:big-task
tag:help-wanted
tag:huge-data
tag:meta
tag:more-info
tag:next
tag:no-stale
tag:requires-migrations
tag:research
tag:security 🛡️
tag:stale
tag:waiting-for-assignee
type:bug
type:documentation
type:enhancement
type:meta
type:ux
user:cli
user:web
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/shiori#462
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @rnkn on GitHub (May 24, 2025).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/1108
Data
Describe the bug / actual behavior
I can reliably reproduce a memory leak when fetching this page: https://leahroseman.substack.com/p/lawrence-english-interview
Shiori's memory/CPU usage will overload requiring a server restart.
Expected behavior
Shiori fetches page with consistent memory usage.
To Reproduce
Steps to reproduce the behavior:
Notes
I'm running via Docker on PikaPods.
@sakaru commented on GitHub (Jun 5, 2025):
I thought to look into this as a first issue.
First I'll say that I can replicate this, with a memory limit of 100Mi. Once I increase the limit I see it uses roughly 500Mi, then succeeds. I can also replicate the issue with any other posts on the same substack.
However using
shiori add --no-archival ...avoids the problem. Naturally in the web UI unticking the "Create Archive" checkbox also avoids the problem.In trying to nail down where this memory usage comes from, I found that
processing.go'swarc.NewArchiveis what starts the memory usage. Also notably the boltdb for the archive is roughly 104MB, which seems really quite large.Inspecting the boltdb warc also downloads the feeds XML and the embedded audio file:
So warc is downloading several large files to be part of the archive.
I also see #353 exists which aims to remove the warc dependency, which should solve this issue.
Finally, it's not a leak in the sense that the memory usage stays high. I saw the memory usage return to normal after the archival process.