[GH-ISSUE #1630] Bug: Archivebox leaves dead Chrome processes behind #3988

Closed
opened 2026-03-15 01:12:48 +03:00 by kerem · 7 comments
Owner

Originally created by @groby on GitHub (Dec 30, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1630

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

Archivebox on macOS leaves behind a lot of Chrome processes. Like, really a lot. (200+/hour)

Running for 12h leaves an M4 mac with 128GB crawling along - I ended up with 4000 processes.

Steps to reproduce

Run  update overnight, where many actions involving Chrome fail.
Run ps -ef | grep Chrome | wc -l

Marvel at the output (in my case, 4268)

Logs or errors


ArchiveBox Version

0.7.2
ArchiveBox v0.7.2 BUILD_TIME=2024-12-22 19:26:23 1734924383
IN_DOCKER=False IN_QEMU=False ARCH=arm64 OS=Darwin PLATFORM=macOS-15.2-arm64-arm-64bit PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=502:20 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /Users/groby/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /Users/groby/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/bin/archivebox

 √  CURL_BINARY           v8.7.1          valid     /usr/bin/curl
 -  WGET_BINARY           -               disabled  wget
 √  NODE_BINARY           v23.5.0         valid     /opt/homebrew/bin/node
 √  SINGLEFILE_BINARY     v1.0.31         valid     ./node_modules/single-file/cli/single-file
 -  READABILITY_BINARY    -               disabled  ./node_modules/readability-extractor/readability-extractor
 -  MERCURY_BINARY        -               disabled  postlight-parser
 -  GIT_BINARY            -               disabled  /usr/bin/git
 -  YOUTUBEDL_BINARY      -               disabled  /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/bin/yt-dlp
 √  CHROME_BINARY         v131.0.6778.205  valid     "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
 √  RIPGREP_BINARY        v14.1.1         valid     /opt/homebrew/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            10 files        valid     /Users/groby/web-archive.2
 √  SOURCES_DIR           37 files        valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           2953 files      valid     ./archive
 √  CONFIG_FILE           453.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             20.8 MB         valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

pip

What operating system are you running on?

macOS (including Docker on macOS)

What type of drive are you using to store your ArchiveBox data?

  • data/ is on a local SSD or NVMe drive
  • data/ is on a spinning hard drive or external USB drive
  • data/ is on a network mount (e.g. NFS/SMB/CIFS/etc.)
  • data/ is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)

Docker Compose Configuration


ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = x9qDWcu57reCyCob1R6jiCOObd6hPGZFRW2cRlp21TpaTf_Gws

[ARCHIVE_METHOD_TOGGLES]
SAVE_TITLE = True
SAVE_FAVICON = False
SAVE_GIT = False
SAVE_SINGLEFILE = True
SAVE_READABILITY = False
SAVE_MERCURY = False
SAVE_PDF = True
SAVE_DOM = False
SAVE_SCREENSHOT = True
SAVE_WGET = True
SAVE_WARC = True
SAVE_ARCHIVEBOX = False
SAVE_ARCHIVE_DOT_ORG = True
SAVE_MEDIA = False

[DEPENDENCY_CONFIG]
USE_SINGLEFILE = True
USE_WGET = False
Originally created by @groby on GitHub (Dec 30, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1630 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug Archivebox on macOS leaves behind a lot of Chrome processes. Like, really a lot. (200+/hour) Running for 12h leaves an M4 mac with 128GB crawling along - I ended up with 4000 processes. ### Steps to reproduce ```markdown Run update overnight, where many actions involving Chrome fail. Run ps -ef | grep Chrome | wc -l Marvel at the output (in my case, 4268) ``` ### Logs or errors ```shell ``` ### ArchiveBox Version ```shell 0.7.2 ArchiveBox v0.7.2 BUILD_TIME=2024-12-22 19:26:23 1734924383 IN_DOCKER=False IN_QEMU=False ARCH=arm64 OS=Darwin PLATFORM=macOS-15.2-arm64-arm-64bit PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=False FS_USER=502:20 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /Users/groby/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /Users/groby/.local/share/uv/python/cpython-3.11.11-macos-aarch64-none/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/bin/archivebox √ CURL_BINARY v8.7.1 valid /usr/bin/curl - WGET_BINARY - disabled wget √ NODE_BINARY v23.5.0 valid /opt/homebrew/bin/node √ SINGLEFILE_BINARY v1.0.31 valid ./node_modules/single-file/cli/single-file - READABILITY_BINARY - disabled ./node_modules/readability-extractor/readability-extractor - MERCURY_BINARY - disabled postlight-parser - GIT_BINARY - disabled /usr/bin/git - YOUTUBEDL_BINARY - disabled /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/bin/yt-dlp √ CHROME_BINARY v131.0.6778.205 valid "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" √ RIPGREP_BINARY v14.1.1 valid /opt/homebrew/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/archivebox √ TEMPLATES_DIR 3 files valid /Users/groby/.cache/uv/archive-v0/mKSsKdWCSu1q1KNrY4cOM/lib/python3.11/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 10 files valid /Users/groby/web-archive.2 √ SOURCES_DIR 37 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 2953 files valid ./archive √ CONFIG_FILE 453.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 20.8 MB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? pip ### What operating system are you running on? macOS (including Docker on macOS) ### What type of drive are you using to store your ArchiveBox data? - [x] `data/` is on a local SSD or NVMe drive - [ ] `data/` is on a spinning hard drive or external USB drive - [ ] `data/` is on a network mount (e.g. NFS/SMB/CIFS/etc.) - [ ] `data/` is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.) ### Docker Compose Configuration ```shell ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = x9qDWcu57reCyCob1R6jiCOObd6hPGZFRW2cRlp21TpaTf_Gws [ARCHIVE_METHOD_TOGGLES] SAVE_TITLE = True SAVE_FAVICON = False SAVE_GIT = False SAVE_SINGLEFILE = True SAVE_READABILITY = False SAVE_MERCURY = False SAVE_PDF = True SAVE_DOM = False SAVE_SCREENSHOT = True SAVE_WGET = True SAVE_WARC = True SAVE_ARCHIVEBOX = False SAVE_ARCHIVE_DOT_ORG = True SAVE_MEDIA = False [DEPENDENCY_CONFIG] USE_SINGLEFILE = True USE_WGET = False ```
kerem closed this issue 2026-03-15 01:12:53 +03:00
Author
Owner

@pirate commented on GitHub (Dec 30, 2024):

It's a common issue with some specific environments that I'm working on a few ways to solve. One thing you should try especially on macOS is using CHROME_HEADLESS=False. Headed chrome doesn't hang on exit and usually works better on macOS if you're not running in docker.

Your singlefile version is also quite out of date, try updating it npm install single-file-cli@1.1.54 as well, thats should help a lot.

See here for more info:

<!-- gh-comment-id:2566008343 --> @pirate commented on GitHub (Dec 30, 2024): It's a common issue with some specific environments that I'm working on a few ways to solve. One thing you should try especially on macOS is using `CHROME_HEADLESS=False`. Headed chrome doesn't hang on exit and usually works better on macOS if you're not running in docker. Your singlefile version is also quite out of date, try updating it `npm install single-file-cli@1.1.54` as well, thats should help a lot. See here for more info: - https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140 - https://github.com/ArchiveBox/ArchiveBox/issues/746
Author
Owner

@kraigu commented on GitHub (Dec 31, 2024):

@pirate I'm seeing similar behaviour on a FreeBSD system, I assume it's the same issue? I'll try with the headless = false setting as well and see how it goes, anyway. I can open separate ticket or add to this one with more fulsome details about my environment if that would be helpful.

<!-- gh-comment-id:2566600822 --> @kraigu commented on GitHub (Dec 31, 2024): @pirate I'm seeing similar behaviour on a FreeBSD system, I assume it's the same issue? I'll try with the headless = false setting as well and see how it goes, anyway. I can open separate ticket or add to this one with more fulsome details about my environment if that would be helpful.
Author
Owner

@kraigu commented on GitHub (Jan 1, 2025):

It left fewer chrome processes behind, at the seeming cost of for some reason not retrieving page titles.

<!-- gh-comment-id:2567182543 --> @kraigu commented on GitHub (Jan 1, 2025): It left fewer chrome processes behind, at the seeming cost of for some reason not retrieving page titles.
Author
Owner

@rcarmo commented on GitHub (Jan 8, 2025):

I have the same issue in Linux containers.

<!-- gh-comment-id:2577196241 --> @rcarmo commented on GitHub (Jan 8, 2025): I have the same issue in Linux containers.
Author
Owner

@pirate commented on GitHub (Jan 8, 2025):

Lets move this discussion over to the existing issue to avoid duplicates:

comment / subscribe over there to get progress updates

<!-- gh-comment-id:2578502354 --> @pirate commented on GitHub (Jan 8, 2025): Lets move this discussion over to the existing issue to avoid duplicates: - https://github.com/ArchiveBox/ArchiveBox/issues/746#issuecomment-2578562957 comment / subscribe over there to get progress updates
Author
Owner

@rcarmo commented on GitHub (Jan 9, 2025):

This isn't really the same thing as #746

<!-- gh-comment-id:2581133457 --> @rcarmo commented on GitHub (Jan 9, 2025): This isn't really the same thing as #746
Author
Owner

@pirate commented on GitHub (Jan 10, 2025):

Did you see my analysis in https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140 ? I believe it's all the same underlying hang-before-exit / hang-on-startup upstream chromium bug that happens on only some combinations of CPU architecture, chrome version, headless mode, and user data dir flags.

<!-- gh-comment-id:2581552476 --> @pirate commented on GitHub (Jan 10, 2025): Did you see my analysis in https://github.com/cypress-io/cypress/issues/27264#issuecomment-1972167140 ? I believe it's all the same underlying hang-before-exit / hang-on-startup upstream chromium bug that happens on only some combinations of CPU architecture, chrome version, headless mode, and user data dir flags.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3988
No description provided.