mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #746] singlefile and other Chrome extractors leave behind zombie orphan chromium processes that never exit #1977
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1977
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (May 13, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/746
Originally assigned to: @pirate on GitHub.
I notice a huge resource allocation hole on Ubuntu Server 18.04 w/ docker-compose... the system never seems to go back to idle unless I reboot... up to 4.00+ system load after archiving has stopped.
@pirate commented on GitHub (May 13, 2021):
Please post the full output of
archivebox --version.I suspect it's the Chrome orphan child problem we've been before https://github.com/ArchiveBox/ArchiveBox/issues/550 (Chrome is naughty by forking some child processes that it doesn't clean up on exit).
The fix is usually just to upgrade your ArchiveBox + Chrome version, but on some setups the issue persists and needs to be fixed by adding a
--no-zygoteand--single-processflags.@ghost commented on GitHub (May 13, 2021):
@pirate commented on GitHub (May 13, 2021):
Hmm it's the latest version which should have the fix. Can you check if the high resource use is by chromium or a different process using
htop? (sort by CPU / try viewing in tree mode to see which procs in the container are using the most CPU/mem)@ghost commented on GitHub (May 13, 2021):
It does appear to be chromium, here's a screencap of htop (this is after restarting the system and running a couple archives. the system load seems to be OK-ish, but there are a lot of chromium processes listed?)
here is sorted w/ tree: (edit, actually uploaded the right cap)
@pirate commented on GitHub (May 13, 2021):
Yeah ok, this is an annoying known bug we've seen a few times with Chromium.
Can you try the fix I just put on
devby building a new docker container and running from that:Then update your
docker-compose.ymlto usearchivebox:dev:Let me know if that fixes it or not.
@ghost commented on GitHub (May 13, 2021):
Will update -- thank you for your help, much appreciated
@ghost commented on GitHub (May 13, 2021):
Update: chromium no longer causing problems. All is well. Thanks again!
edit: lengthy tasks still cause this bug
@ghost commented on GitHub (May 13, 2021):
It seems to happen on archive tasks that take a while to complete. performance has improved noticeably though since the last change
@pirate commented on GitHub (May 13, 2021):
Can you screenshot htop again, but scroll over to the right a bit more to see the full chrome args? I think that'll tell us what extractor is running chrome.
I suspect SingleFile is the odd one out not using our
--no-zygotefix, because it handles launching its own version of chrome.As a test you can try temporarily disabling the singlefile extractor with
archivebox config --set SAVE_SINGLEFILE=False.If that stops it then we know it's Singlefile's chrome. Then I can look into changing the singlefile chrome args by getting them into
singlefile --browser-executable-path=....@ghost commented on GitHub (May 13, 2021):
the full args is huge, I don't have enough screens -- but I copied the line for the chromium PID:
13484 caddy 20 0 37.6G 226M 157M S 0.0 11.4 0:00.00 /usr/lib/chromium/chromium --show-component-extension-options --enable-gpu-rasterization --no-default-browser-check --disable-pings --media-router=0 --enable-remote-extensions --load-extension= --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=TranslateUI --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --enable-blink-features=IdleDetection --headless --hide-scrollbars --mute-audio about:blank --headless --no-sandbox --disable-gpu --disable-dev-shm-usage --disable-software-rasterizer --run-all-compositor-stages-before-draw --hide-scrollbars --single-process --no-zygote --user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/{VERSION} (+https://github.com/ArchiveBox/ArchiveBox/) --window-size=1440,2000 --disable-web-security --no-pings --window-size=1280,720 --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_chrome_profile-T7r5l7@ghost commented on GitHub (May 13, 2021):
I threw 6 URLs at it and so far it seems to be working without leaving chromium hanging (about 10 mins now OK)
edit: yeah now it's chugging along nicely, performance jump as well.
@pirate commented on GitHub (May 13, 2021):
Looks like you have
--single-process --no-zygotein the args anyway, so my idea was wrong. That chrome proc shouldn't leave behind any orphan processes with--single-process --no-zygote(in theory).@axb21 commented on GitHub (May 14, 2021):
Just wanted to throw in here that I've observed the same thing. The load on the machine running archivebox jumped to very high levels, mostly caused by hundreds of chromium processes, after I submitted ~150 URLs to be archived.
@agg23 commented on GitHub (Jun 9, 2021):
I am also experiencing the same. I am automatically archiving pages as I visit them, and finally got around to checking on the slowdown of my Docker host.
ps aux | grep -c chromiumreports that there are 443 instances of Chromium open right now...@pirate commented on GitHub (Jun 11, 2021):
This problem will go away when we switch to using long-running playwright workers instead of spawning chrome headless 3x for each URL https://github.com/ArchiveBox/ArchiveBox/issues/51
In the meantime to work around this chrome bug we may have to add machinery to keep track of all forked subprocesses and kill them after every extractor, but thats a lot of additional time, code complexity, and hassle. Hopefully chromium fixes it first, but if not I may just focus on switching to playwright faster rather than write subprocess killing code that will have to be torn out anyway.
@cdzombak commented on GitHub (Nov 16, 2021):
Ran into this last night trying to import my Pinboard archive of ~8000 links. When I woke up this morning, the machine was almost entirely out of RAM (8GB) & swap (16GB), and load averages were around 350-400.
A representative line from
ps:Running the
archivebox/devtag I pulled yesterday via docker-compose:@pirate commented on GitHub (Nov 16, 2021):
Ok, I can't promise a fix immediately but will take a look soon at why
--no-zygoteisn't working as it should.@cdzombak commented on GitHub (Nov 16, 2021):
Understood, this isn't a life-or-death issue for me 😄
@salykin commented on GitHub (Nov 17, 2022):
Hello! Did anybody find something?
@salykin commented on GitHub (Nov 17, 2022):
If anyone wants to somehow workaround the problem, take a look at this bash script:
The script loops over chromium processes every 5 minutes, looks for longliving chromiums (older than 10 minutes), and kills them all.
@pirate commented on GitHub (Jun 13, 2023):
Are ya'll still seeing this issue? There have been many updates to chromium and
archivebox/archivebox:devsince 2022/11, and I'm curious if this is still a concern.I personally haven't noticed major issues with zombie Chrome processes on our demo server or my personal ArchiveBox instance in the last few months, so let me know if you're seeing it on your machines still.
@msalmasi commented on GitHub (Jul 19, 2023):
Hi @pirate
I seem to be having this issue on Chromium v114 and ArchiveBox 0.6.3 (dev branch) (running in docker). This leads to the creation of a SingletonLock file in the user data profile that prevents private archiving from working, and ultimately profile corruption.
@mAAdhaTTah commented on GitHub (Jul 19, 2023):
I can also confirm I'm still experiencing this and periodically restart the archivebox docker container to compensate.
@gmsotavio commented on GitHub (Aug 25, 2023):
I can also confirm I'm still experiencing this and periodically restart the archivebox FreeBSD jail to compensate.
I have been using the dev branch.
>[85067:55623680:0825/101611.186656:ERROR:process_singleton_posix.cc(334)] Failed to create /mnt/archivebox/data/chromium/.config/chromium/SingletonLock: File exists (17)@pirate commented on GitHub (Aug 25, 2023):
I'm currently working on a refactor to use a long-running scrapy-playwright based worker inside of a huey job queue system. It should solve this issue for good, as even if chrome misbehaves the worker can periodically restart on it's own to clear out zombie processes and release leaked memory. It's complex but is looking like it might be a big upgrade for the project, maybe finally warranting an ArchiveBox 1.0 version.
@sclu1034 commented on GitHub (Jun 14, 2024):
I seem to be running into this issue as well, but I haven't been able to check the process list for chrome, yet.
The spikes up coincide with importing large sets of URLs, the drops at "06/12 00:00" and at the very end of the chart are both manual restarts. The imports had long finished by that time.
@clb92 commented on GitHub (Jun 19, 2024):
I'm having this problem now, where I notice all 40 threads on my server pegged at 100%. Running
topshows that it's a lot of Chrome processes belonging to ArchiveBox. This happens even though there's not very many pending items in ArchiveBox. I basically have to restart ArchiveBox container daily to clear the Chrome processes.@krosseyed commented on GitHub (Jul 1, 2024):
I think am also running into this issue, as my search results ended me up here. Please let me know if this should be a different ticket.
I recently turned on SingleFile support after ensuring that chromium is working using
docker-compose run version, and now I want to rundocker-compose run archivebox updateto get my existing (183 total) snapshots updated.However I can only get through the first 10 or so of my existing links before it starts to error out after 60 seconds whenever creating a singlefile or a pdf is attempted. I can however go one at a time by doing something like
archivebox update -t timestamp 1717948555.850022and the little RaspberryPi I am using doesn't fall over after doing about 20 or so.I hope this brings some light to this issue and if what I am running into is the same bug.
@srd424 commented on GitHub (Nov 2, 2024):
Still seeing this - or something very like it - with 0.7.2 (docker image.) Would
systemd-runhelp here? It's very good at dumping everything in its own cgroup, which makesmercilessly killingcleaning up orphans a bit easier ..@jmeggitt commented on GitHub (Nov 9, 2024):
After some debugging, I found the source of the issue.
The orphaned Chome processes are getting spawned by
/app/node_modules/single-file-cli/single-file. Normally it properly cleans up the child processes when it finishes, however that does not happen if the node process is killed. Since ArchiveBox runssingle-filevia's Python'ssubprocess.runapi with a timeout, python will sendSIGKILLto the process when the timeout is reached.Reproduction Steps
1. Find a server with a long response time
First, you will need to find a page that will take long enough to render that you can kill it before it finishes. If you have trouble finding one, you can use this Python script to run a local HTTP server with an artificially long delay.
2. Run
single-fileSince I use docker compose for ArchiveBox, I used the following command to run
single-filewithin my existing container.3. Kill the
single-fileprocess before it finishesKill the process running
single-file. Make sure to usekill -9to sendSIGKILLto the process. This bug will not be triggered if you attempt to killsingle-filewith ctrl+C.4. Check running processes
At this point, the bug should have been hit. During my testing I had a 100% hit rate, so I expect that it will not be difficult for others to reproduce.
pstreecan be used to verify that there are dangling chrome processes are still running.Possible Fixes
1. Manually terminate the process when the timeout is reached.
single-filehas code to handle the signalsSIGTERMandSIGINT(code). ArchiveBox could add some logic to perform the timeout manually, then send one of these signals when the timeout is reached. That might look something like this:2. Make
single-filekill subprocesses on exitThis means it needs to do the following, so the OS knows what to do with its children when it dies. I am not sure what the nodejs equivalent of this is, but I imagine there is probably some sort of API for this. Additionally, I only looked into the details for Linux. I am not sure what happens on other platforms.
@pirate commented on GitHub (Nov 12, 2024):
Thanks for this debugging @jmeggitt, what I'd ideally like to do is handle this in a general way where extractors are started in a process group, and the entire group is killed when the timeout is hit, so that way we don't need to rely on extractors killing their children properly on SIGINT.
@jmeggitt commented on GitHub (Nov 12, 2024):
Good point. Process groups would defiantly be a more complete solution. I have not worked directly with process groups before, so I can't say for sure, however I think
subprocess.run(..., start_new_session=True)may achieve this? If so, this should not be that difficult to fix.https://docs.python.org/3/library/subprocess.html#subprocess.Popen
@pirate commented on GitHub (Nov 12, 2024):
Yup 💯 , the v0.8.5rc already uses
start_new_session=Truefor some workers (just supervisord at the moment), and I plan on implementing it within the newactorssystem that runs extractor jobs:github.com/ArchiveBox/ArchiveBox@a9a3b153b1/archivebox/actors/actor.py (L229)@comatory commented on GitHub (Dec 6, 2024):
Is this one addressed in https://github.com/ArchiveBox/ArchiveBox/pull/1311 ?
I'd like to even run RC because Archivebox is hogging up my server unfortunately 😅
@pirate commented on GitHub (Jan 8, 2025):
PSA to everyone following this, a little while back I did some testing and confirmed this is an underlying bug with Chrome's implementation of their new
--headless=newmode on some platforms, and happens reliably outside of ArchiveBox / even if ArchiveBox is not installed.You can see my full analysis + steps to reproduce the issue outside of ArchiveBox here:
🚨 Please help us get this fixed upstream by commenting / upvoting the issue over on the Chromium bug tracker: ➡️ https://issues.chromium.org/issues/327583144 🚨
In the meantime I'm working on 3 workarounds to fix the issue in the v0.8.x dev branch:
launching chrome processes inside a process group with
start_new_session=True, and killing the entire process group at the end of each extractor run (as described above)always running chrome in headful mode (aka
CHROME_HEADLESS=False) and connecting it to a virtual Xvfb display (also allows archiving to be watched in realtime in the browser using novnc), this works becuase the hang-before-exit bug only happens in headless mode on macOS when not using a user data dir, and non-headless mode on Linux when using a user data dir AFAICT.always running chrome with a
--user-data-dir=...by creating a new empty user data dir when one is not provided, and always copying any provided user-data-dir to a new folder that's unique per chrome instance to avoid SingletonLock contention (using copy-on-write when available on the filesystem to avoid actually duplicating the entire chrome user data dir)I'm not yet done with fix
3., but fix1.and fix2.(currently available only when running >=v0.8.5rc51on docker-compose only) are partially completed ondev. I'll post updates here as more progess is made.@scttnlsn commented on GitHub (Jan 17, 2025):
@pirate I was running into this issue so I tried out
0.8.5rc51but it looks like some migrations are missing:I'm running via Docker Compose. I get related errors when taking various actions in the app as well:
I tried pulling the latest code and running Django's
makemigrationscommand myself but was getting some other errors related to circular imports (the same error seen in CI here: https://github.com/ArchiveBox/ArchiveBox/actions/runs/12681404211/job/35345049585). I realize things are in flux so wasn't sure if you want any contributions to fix any of this right now.EDIT: I see this is already documented here: https://github.com/ArchiveBox/ArchiveBox/issues/1566
@pirate commented on GitHub (Feb 6, 2025):
The last few weeks I've been testing alternative browser driver solutions that take care of cleaning up processes on their own (among many other things that I don't want to have to build). Here are the top contenders:
The main things I'm looking for to be useful to ArchiveBox:
wss://remote debugger, update cookies, etc. for up to 16 sessions at a time per host@pirate commented on GitHub (Feb 16, 2025):
Quick update: the chromium team is finally taking a look at our bug report! They've assigned it to a team member and they're working on it now. https://issues.chromium.org/issues/327583144
@comatory commented on GitHub (Feb 17, 2025):
@pirate do you have a link so I could subscribe to the updates? Or if you can keep us updated here that would be much appreciated 🙏
@JustTooKrul commented on GitHub (Apr 11, 2025):
After spending quite a while getting chromium and novnc to work to properly setup the browser, it seems like migrating to something that has the flexibility to run different browsers and have the entire process operate in one container would be a big benefit. It's especially intriguing given how restrictive chrome-flavored browsers are becoming with the manifest change. Using uBlock in the headless browser to filter ads and bypass paywalls is a benefit that chrome is about to lose--having other browser options in a nice package could help those collection methods.
@comatory commented on GitHub (May 22, 2025):
I would love to start using archivebox again, I wanted to have it as a replacement for (other) bookmarking service. I'm just unable to, because every time I archive something, it crashes my server.
Is there an easy way to disable the behaviour with current version? I see that last release was in Dec 2024, not sure if new release is in the work.
I see this as a planned work for
0.8milestone, so 🤞@pirate commented on GitHub (May 23, 2025):
see recent news post here: https://github.com/ArchiveBox/ArchiveBox/issues/191#issuecomment-2848370416
@pirate commented on GitHub (Dec 29, 2025):
this should be fixed on dev, we now auto-kill zombie chrome processs anytime we launch a new one
@clb92 commented on GitHub (Dec 29, 2025):
That's good news, thanks! Then I won't need to restart the container daily anymore.