[GH-ISSUE #914] Bug: ArchiveBox search timing out with 20,000 snapshots #565

Closed
opened 2026-03-01 14:44:38 +03:00 by kerem · 1 comment
Owner

Originally created by @sonofhypnos on GitHub (Jan 19, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/914

Describe the bug

When I try to search my local archive in the searchbar on top, it times out after 90 seconds with:
"Error from the search backend, only showing results from default admin search fields - Error: Command '['rg', '--type-add', 'ignore:*.{css,js,orig,svg}', '-ilTignore', '-e', 'a', '/home/tassilo/archivebox/archive']' timed out after 90 seconds"
every time. (I tried searching single letters: "a", "b", "c", urls: "example.com", words: "hello")

Steps to reproduce

  1. Ran ArchiveBox with the following config:
    https://pastebin.com/uSgx2Kk4
  2. run archivebox server in archive directory.
  3. search in search

Screenshots or log output

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-5.15.5-76051505-generic-x86_64-with-glibc2.33 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/bin/python3.9                                                          
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.9/dist-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.74.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21           valid     /usr/bin/wget                                                               
 -  NODE_BINARY           -               disabled  /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.32         valid     ./node_modules/single-file/cli/single-file                                  
 -  READABILITY_BINARY    -               disabled  ./node_modules/readability-extractor/readability-extractor                  
 -  MERCURY_BINARY        -               disabled  ./node_modules/@postlight/mercury-parser/cli.js                             
 √  GIT_BINARY            v2.30.2         valid     /usr/bin/git                                                                
 -  YOUTUBEDL_BINARY      -               disabled  /usr/local/bin/youtube-dl                                                   
 -  CHROME_BINARY         -               disabled  /home/tassilo/.cache/ms-playwright/chromium-939194/chrome-linux/chrome      
 √  RIPGREP_BINARY        v12.1.1         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/local/lib/python3.9/dist-packages/archivebox                           
 √  TEMPLATES_DIR         3 files         valid     /usr/local/lib/python3.9/dist-packages/archivebox/templates                 
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            11 files        valid     /home/tassilo/archivebox                                                    
 √  SOURCES_DIR           65 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           12252 files     valid     ./archive                                                                   
 √  CONFIG_FILE           431.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             63.6 MB         valid     ./index.sqlite3                                                             
Originally created by @sonofhypnos on GitHub (Jan 19, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/914 #### Describe the bug When I try to search my local archive in the searchbar on top, it times out after 90 seconds with: ```"Error from the search backend, only showing results from default admin search fields - Error: Command '['rg', '--type-add', 'ignore:*.{css,js,orig,svg}', '-ilTignore', '-e', 'a', '/home/tassilo/archivebox/archive']' timed out after 90 seconds"``` every time. (I tried searching single letters: "a", "b", "c", urls: "example.com", words: "hello") #### Steps to reproduce 1. Ran ArchiveBox with the following config: https://pastebin.com/uSgx2Kk4 2. run ```archivebox server``` in archive directory. 3. search in search #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs ArchiveBox v0.6.2 Cpython Linux Linux-5.15.5-76051505-generic-x86_64-with-glibc2.33 x86_64 IN_DOCKER=False DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/bin/python3.9 √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.9/dist-packages/django/bin/django-admin.py √ CURL_BINARY v7.74.0 valid /usr/bin/curl √ WGET_BINARY v1.21 valid /usr/bin/wget - NODE_BINARY - disabled /usr/bin/node √ SINGLEFILE_BINARY v0.3.32 valid ./node_modules/single-file/cli/single-file - READABILITY_BINARY - disabled ./node_modules/readability-extractor/readability-extractor - MERCURY_BINARY - disabled ./node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.30.2 valid /usr/bin/git - YOUTUBEDL_BINARY - disabled /usr/local/bin/youtube-dl - CHROME_BINARY - disabled /home/tassilo/.cache/ms-playwright/chromium-939194/chrome-linux/chrome √ RIPGREP_BINARY v12.1.1 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/local/lib/python3.9/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/local/lib/python3.9/dist-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 11 files valid /home/tassilo/archivebox √ SOURCES_DIR 65 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 12252 files valid ./archive √ CONFIG_FILE 431.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 63.6 MB valid ./index.sqlite3
kerem closed this issue 2026-03-01 14:44:38 +03:00
Author
Owner

@pirate commented on GitHub (Jan 22, 2022):

Sounds about right, unless you're on a really fast SSD, rg is going to struggle searching that many files that quickly. This is why rg is just the easy default, but we support and recommend sonic for real full-text search ;)

https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#L30

<!-- gh-comment-id:1019366063 --> @pirate commented on GitHub (Jan 22, 2022): Sounds about right, unless you're on a really fast SSD, `rg` is going to struggle searching that many files that quickly. This is why `rg` is just the easy default, but we support and recommend `sonic` for real full-text search ;) https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#L30
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#565
No description provided.