[GH-ISSUE #1213] Bug: Favicon is not always visible #745

Closed
opened 2026-03-01 14:46:02 +03:00 by kerem · 3 comments
Owner

Originally created by @tibequadorian on GitHub (Aug 17, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1213

Describe the bug

When archiving any site just using a minimal set of extractors like --extract title,favicon,readability, the favicon is never visible in the index, and the spinner.gif is shown instead.
By adding another extractor to the list, like pdf for example, the favicon is visible again, as shown below.

Steps to reproduce

archivebox add --extract title,favicon,readability https://www.nytimes.com/2023/08/16/health/abortion-pill-ruling.html

Screenshots or log output

image

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-6.1.0-11-amd64-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                    
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     /data                                                                       
 √  SOURCES_DIR           14 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           11 files        valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             212.0 KB        valid     ./index.sqlite3                                                             
Originally created by @tibequadorian on GitHub (Aug 17, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1213 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug When archiving any site just using a minimal set of extractors like `--extract title,favicon,readability`, the favicon is **never visible** in the index, and the spinner.gif is shown instead. By adding another extractor to the list, like `pdf` for example, the favicon is visible again, as shown below. #### Steps to reproduce ``` archivebox add --extract title,favicon,readability https://www.nytimes.com/2023/08/16/health/abortion-pill-ruling.html ``` #### Screenshots or log output ![image](https://github.com/ArchiveBox/ArchiveBox/assets/9560587/df5bc5b8-a409-4894-9a48-47ca93d72c38) <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs ArchiveBox v0.6.2 Cpython Linux Linux-6.1.0-11-amd64-x86_64-with-glibc2.28 x86_64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid /data √ SOURCES_DIR 14 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 11 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 212.0 KB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem closed this issue 2026-03-01 14:46:02 +03:00
Author
Owner

@pirate commented on GitHub (Aug 17, 2023):

Favicon & title alone are not considered enough for the snapshot to be "archived", which is why it shows the in-progress spinner. We could add readability to the set of things it considers enough for it to qualify as "archived" though.

<!-- gh-comment-id:1682997115 --> @pirate commented on GitHub (Aug 17, 2023): Favicon & title alone are not considered enough for the snapshot to be "archived", which is why it shows the in-progress spinner. We could add readability to the set of things it considers enough for it to qualify as "archived" though.
Author
Owner

@tibequadorian commented on GitHub (Aug 17, 2023):

I'm fine with your suggestion as it works out in my case but in general wouldn't it make more sense to show the spinner just as long as the archiving process takes, no matter what extractors are used?

<!-- gh-comment-id:1683034686 --> @tibequadorian commented on GitHub (Aug 17, 2023): I'm fine with your suggestion as it works out in my case but in general wouldn't it make more sense to show the spinner just as long as the archiving process takes, no matter what extractors are used?
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

Fixed in 19e9c1c2f0, will be out in v0.7.3.

<!-- gh-comment-id:1900264133 --> @pirate commented on GitHub (Jan 19, 2024): Fixed in 19e9c1c2f0a0bb755bf695281271603c44edbdd4, will be out in `v0.7.3`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#745
No description provided.