[GH-ISSUE #727] Bug: This link cannot be snapshot properly: https://ciechanow.ski/internal-combustion-engine/ #462

Closed
opened 2026-03-01 14:43:48 +03:00 by kerem · 1 comment
Owner

Originally created by @ekianjo on GitHub (May 1, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/727

Describe the bug

This link cannot be archived properly

Steps to reproduce

  1. Add https://ciechanow.ski/internal-combustion-engine/ to snapshot
  2. Snapshot completes
  3. archives all lack the 3d animations present in the original page

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-4.19.0-16-amd64-x86_64-with-debian-10.9 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/bin/archivebox                                                         
 √  PYTHON_BINARY         v3.7.3          valid     /usr/bin/python3.7                                                          
 √  DJANGO_BINARY         v3.1.8          valid     /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v13.14.0        valid     /usr/bin/node                                                               
 X  SINGLEFILE_BINARY     ?               invalid   single-file                                                                 
 X  READABILITY_BINARY    ?               invalid   readability-extractor                                                       
 X  MERCURY_BINARY        ?               invalid   mercury-parser                                                              
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.17     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/lib/python3/dist-packages/archivebox                                   
 √  TEMPLATES_DIR         3 files         valid     /usr/lib/python3/dist-packages/archivebox/templates                         
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              


[i] Data locations:

[!] Warning: Missing 3 recommended dependencies
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False
            
    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False
            
    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False

Originally created by @ekianjo on GitHub (May 1, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/727 #### Describe the bug This link cannot be archived properly #### Steps to reproduce 1. Add https://ciechanow.ski/internal-combustion-engine/ to snapshot 2. Snapshot completes 3. archives all lack the 3d animations present in the original page #### ArchiveBox version ```logs ArchiveBox v0.6.2 Cpython Linux Linux-4.19.0-16-amd64-x86_64-with-debian-10.9 x86_64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/bin/archivebox √ PYTHON_BINARY v3.7.3 valid /usr/bin/python3.7 √ DJANGO_BINARY v3.1.8 valid /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v13.14.0 valid /usr/bin/node X SINGLEFILE_BINARY ? invalid single-file X READABILITY_BINARY ? invalid readability-extractor X MERCURY_BINARY ? invalid mercury-parser √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.17 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v89.0.4389.114 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/lib/python3/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/lib/python3/dist-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: [!] Warning: Missing 3 recommended dependencies ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False ```
kerem closed this issue 2026-03-01 14:43:48 +03:00
Author
Owner

@pirate commented on GitHub (May 1, 2021):

Not all sites can be archived perfectly, ArchiveBox doesn't actually do the archiving, it's a wrapper around 12 different extractor modules that use external programs to save the content. It looks like unfortunately none of those modules (e.g. singlefile, wget, and chrome dom dump) work for the JS canvas animations on this site.

If you need higher fidelity archiving check out https://ArchiveWeb.page + https://ReplayWeb.Page.

<!-- gh-comment-id:830620469 --> @pirate commented on GitHub (May 1, 2021): Not all sites can be archived perfectly, ArchiveBox doesn't actually do the archiving, it's a wrapper around 12 different extractor modules that use external programs to save the content. It looks like unfortunately none of those modules (e.g. singlefile, wget, and chrome dom dump) work for the JS canvas animations on this site. If you need higher fidelity archiving check out https://ArchiveWeb.page + https://ReplayWeb.Page.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#462
No description provided.