[GH-ISSUE #848] Bug: Segmentation fault trying to load website #526

Closed
opened 2026-03-01 14:44:20 +03:00 by kerem · 3 comments
Owner

Originally created by @rowanoulton on GitHub (Sep 15, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/848

Describe the bug

When I add the URL https://alexey-vasilyev.com/my-dear-yakutia-ongoing, whether through the web UI or command line, the snapshot fails with a segmentation fault and is left forever pending.

Steps to reproduce

  1. Running a fresh install of ArchiveBox out of the box with no additions or changes via docker-compose.
  2. Run archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing
  3. Error

Screenshots or log output

$ archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing
[i] [2021-09-15 22:42:56] ArchiveBox v0.6.2: archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing
    > /data

[+] [2021-09-15 22:42:56] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1631745776-import.txt
    > Parsed 1 URLs from input (Generic TXT)
    > Found 1 new URLs not already in index

[*] [2021-09-15 22:42:56] Writing 1 links to main index...
    √ ./index.sqlite3

[▶] [2021-09-15 22:42:56] Starting archiving of 1 snapshots in index...

[+] [2021-09-15 22:42:56] "alexey-vasilyev.com/my-dear-yakutia-ongoing"
    https://alexey-vasilyev.com/my-dear-yakutia-ongoing
    > ./archive/1631745776.848173
      > title
      ██████                                                               1.5% (1/60sec)Segmentation fault
      ████████████████████████████████████████████████████████████████████ 100.0% (60/60sec)
image

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-5.10.25-linuxkit-aarch64-with-glibc2.28 aarch64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled
 -  COOKIES_FILE          -               disabled

[i] Data locations:
 √  OUTPUT_DIR            6 files         valid     /data
 √  SOURCES_DIR           1 files         valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1 files         valid     ./archive
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf
 √  SQL_INDEX             204.0 KB        valid     ./index.sqlite3

$ archivebox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.10.25-linuxkit-aarch64-with-glibc2.28 aarch64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled
 -  COOKIES_FILE          -               disabled

[i] Data locations:
 √  OUTPUT_DIR            6 files         valid     /data
 √  SOURCES_DIR           1 files         valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1 files         valid     ./archive
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf
 √  SQL_INDEX             204.0 KB        valid     ./index.sqlite3
Originally created by @rowanoulton on GitHub (Sep 15, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/848 #### Describe the bug When I add the URL `https://alexey-vasilyev.com/my-dear-yakutia-ongoing`, whether through the web UI or command line, the snapshot fails with a segmentation fault and is left forever pending. #### Steps to reproduce 1. Running a fresh install of ArchiveBox out of the box with no additions or changes via docker-compose. 2. Run `archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing` 3. Error #### Screenshots or log output ``` $ archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing [i] [2021-09-15 22:42:56] ArchiveBox v0.6.2: archivebox add https://alexey-vasilyev.com/my-dear-yakutia-ongoing > /data [+] [2021-09-15 22:42:56] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1631745776-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2021-09-15 22:42:56] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2021-09-15 22:42:56] Starting archiving of 1 snapshots in index... [+] [2021-09-15 22:42:56] "alexey-vasilyev.com/my-dear-yakutia-ongoing" https://alexey-vasilyev.com/my-dear-yakutia-ongoing > ./archive/1631745776.848173 > title ██████ 1.5% (1/60sec)Segmentation fault ████████████████████████████████████████████████████████████████████ 100.0% (60/60sec) ``` <img width="833" alt="image" src="https://user-images.githubusercontent.com/185649/133519763-3aaa34f2-9385-419b-b77f-4501a628a945.png"> #### ArchiveBox version ``` ArchiveBox v0.6.2 Cpython Linux Linux-5.10.25-linuxkit-aarch64-with-glibc2.28 aarch64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v89.0.4389.114 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 6 files valid /data √ SOURCES_DIR 1 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 204.0 KB valid ./index.sqlite3 $ archivebox version ArchiveBox v0.6.2 Cpython Linux Linux-5.10.25-linuxkit-aarch64-with-glibc2.28 aarch64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v89.0.4389.114 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 6 files valid /data √ SOURCES_DIR 1 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 204.0 KB valid ./index.sqlite3 ```
kerem closed this issue 2026-03-01 14:44:20 +03:00
Author
Owner

@alexey-zaharchenko commented on GitHub (Aug 3, 2022):

same problem with https://yakovlev.me/para-slov-za-alphafold2/

<!-- gh-comment-id:1203638804 --> @alexey-zaharchenko commented on GitHub (Aug 3, 2022): same problem with https://yakovlev.me/para-slov-za-alphafold2/
Author
Owner

@pirate commented on GitHub (Aug 3, 2022):

Can you try with image: archivebox/archivebox:dev and see if that works, I updated the wget version in the dev branch.

<!-- gh-comment-id:1204567698 --> @pirate commented on GitHub (Aug 3, 2022): Can you try with `image: archivebox/archivebox:dev` and see if that works, I updated the wget version in the dev branch.
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

Going to close this as stale. Please open a new issue if you see the same or any other error after > v0.7.2

<!-- gh-comment-id:1899750218 --> @pirate commented on GitHub (Jan 19, 2024): Going to close this as stale. Please open a new issue if you see the same or any other error after > v0.7.2
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#526
No description provided.