[GH-ISSUE #956] Documentation: Document how search works #3613

Closed
opened 2026-03-14 23:44:24 +03:00 by kerem · 7 comments
Owner

Originally created by @bbkane on GitHub (Mar 27, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/956

Wiki Page URL

https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list

Suggested Edit

Document how search works in Archivebox. It looks like https://github.com/ArchiveBox/ArchiveBox/pull/721 and https://github.com/ArchiveBox/ArchiveBox/pull/543 add full text search using Sonic, but I can't find advanced usage details. Can it do boolean queries ("rabbits" AND NOT "racoons") for example? Would someone more familiar with the project add a few sentences describing what the capabilities of the search are and maybe some examples explaining how to use it?

Originally created by @bbkane on GitHub (Mar 27, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/956 ## Wiki Page URL <!-- e.g. https://github.com/pirate/ArchiveBox/wiki/Configuration#use_color --> https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list ## Suggested Edit <!-- e.g. Please add more example usages, or please fix `xyz` typo to be `abc`. --> Document how search works in Archivebox. It looks like https://github.com/ArchiveBox/ArchiveBox/pull/721 and https://github.com/ArchiveBox/ArchiveBox/pull/543 add full text search using Sonic, but I can't find advanced usage details. Can it do boolean queries (`"rabbits" AND NOT "racoons"`) for example? Would someone more familiar with the project add a few sentences describing what the capabilities of the search are and maybe some examples explaining how to use it?
kerem 2026-03-14 23:44:24 +03:00
Author
Owner

@bbkane commented on GitHub (Mar 27, 2022):

Does it search the body as well as the title? Can I order by fields in the query?

<!-- gh-comment-id:1079830637 --> @bbkane commented on GitHub (Mar 27, 2022): Does it search the body as well as the title? Can I order by fields in the query?
Author
Owner

@rcarmo commented on GitHub (Sep 13, 2022):

It would be great to have this documented. Right now it feels like I can't do full text search on page contents, and it is mostly because search is inscrutable.

<!-- gh-comment-id:1245997460 --> @rcarmo commented on GitHub (Sep 13, 2022): It would be great to have this documented. Right now it feels like I can't do full text search on page contents, and it is mostly because search is inscrutable.
Author
Owner

@pirate commented on GitHub (Sep 28, 2022):

I don't believe it can do Boolean queries but if you have Sonic you can do full-text search of the article bodies with fuzzy matching.

<!-- gh-comment-id:1260263914 --> @pirate commented on GitHub (Sep 28, 2022): I don't believe it can do Boolean queries but if you have Sonic you can do full-text search of the article bodies with fuzzy matching.
Author
Owner

@rcarmo commented on GitHub (Sep 28, 2022):

I’m using the container image (which I assume includes it). Yet the lack of documentation still applies.

<!-- gh-comment-id:1260466502 --> @rcarmo commented on GitHub (Sep 28, 2022): I’m using the container image (which I assume includes it). Yet the lack of documentation still applies.
Author
Owner

@pirate commented on GitHub (Nov 18, 2022):

I've added a line to the Usage docs and a screenshot explaining search slightly more here: https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list

I still have to add instructions on how to set up Sonic/ripgrep and configure them later.

  • add short description of search functionality to Wiki usage page + screenshot of search in use
  • add wiki page explaining ripgrep vs sonic, their tradeoffs, and how to set them up
  • add entries to the wiki configuration page for SEARCH_BACKEND_ENGINE, SEARCH_BACKEND_HOST, and SEARCH_BACKEND_PASSWORD
  • add README quick summary explaining that Sonic is available for full-text search, but ripgrep is used by default + link to wiki pages for more info
  • add docs on how to use ripgrep-all instead of ripgrep https://github.com/phiresky/ripgrep-all

In summary for people arriving here via Google the setup instructions for Sonic are as follows:

🔍 Sonic Search Setup Instructions

  1. Download the sonic.cfg file into your data/ folder: curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic.cfg
  2. Uncomment the sonic: container config in docker-compose.yml: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=sonic.cfg
  3. Set the SEARCH_BACKEND_ENGINE, SEARCH_BACKEND_HOST, and SEARCH_BACKEND_PASSWORD config vars in ArchiveBox to point to the new container: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=SEARCH_BACKEND_ENGINE
  4. Restart the Docker compose project with docker-compose down; docker-compose down; docker-compose up
  5. Add any previously archived data into the Sonic index by running docker compose run archivebox update --index-only
  6. Verify Search works from the Snapshot admin page by searching for some text only present in an archived article's body text

If anyone wants to contribute the wiki page with these instructions + screenshots + links to the README I'm happy to review a documentation improvement PR.

<!-- gh-comment-id:1320587158 --> @pirate commented on GitHub (Nov 18, 2022): I've added a line to the Usage docs and a screenshot explaining search slightly more here: https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#explanation-of-buttons-in-the-web-ui---admin-snapshots-list I still have to add instructions on how to set up Sonic/ripgrep and configure them later. - [X] add short description of search functionality to Wiki usage page + screenshot of search in use - [x] add wiki page explaining ripgrep vs sonic, their tradeoffs, and how to set them up - [ ] add entries to the wiki configuration page for `SEARCH_BACKEND_ENGINE`, `SEARCH_BACKEND_HOST`, and `SEARCH_BACKEND_PASSWORD` - [ ] add README quick summary explaining that Sonic is available for full-text search, but ripgrep is used by default + link to wiki pages for more info - [x] add docs on how to use ripgrep-all instead of ripgrep https://github.com/phiresky/ripgrep-all In summary **for people arriving here via Google the setup instructions for Sonic are as follows:** ## 🔍 Sonic Search Setup Instructions 1. Download the `sonic.cfg` file into your `data/` folder: `curl -O https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/etc/sonic.cfg` 2. Uncomment the `sonic:` container config in `docker-compose.yml`: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=sonic.cfg 3. Set the `SEARCH_BACKEND_ENGINE`, `SEARCH_BACKEND_HOST`, and `SEARCH_BACKEND_PASSWORD` config vars in ArchiveBox to point to the new container: https://github.com/ArchiveBox/ArchiveBox/blob/dev/docker-compose.yml#:~:text=SEARCH_BACKEND_ENGINE 4. Restart the Docker compose project with `docker-compose down; docker-compose down; docker-compose up` 5. Add any previously archived data into the Sonic index by running `docker compose run archivebox update --index-only` 6. Verify Search works from the Snapshot admin page by searching for some text only present in an archived article's body text If anyone wants to contribute the wiki page with these instructions + screenshots + links to the README I'm happy to review a documentation improvement PR.
Author
Owner

@diego898 commented on GitHub (Jan 19, 2023):

Split out my comment into #1087

<!-- gh-comment-id:1396372319 --> @diego898 commented on GitHub (Jan 19, 2023): Split out my comment into #1087
Author
Owner

@pirate commented on GitHub (May 7, 2024):

This is mostly done now! Check out our new documentation page here: https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search

I still have to document the config options on the Configuration page but it's a start.

For improvements / suggestions you can comment back here or open a PR with changes for this file:
https://github.com/ArchiveBox/docs/blob/master/Setting-up-Search.md

<!-- gh-comment-id:2097716880 --> @pirate commented on GitHub (May 7, 2024): This is mostly done now! Check out our new documentation page here: https://github.com/ArchiveBox/ArchiveBox/wiki/Setting-up-Search I still have to document the config options on the `Configuration` page but it's a start. For improvements / suggestions you can comment back here or open a PR with changes for this file: https://github.com/ArchiveBox/docs/blob/master/Setting-up-Search.md
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3613
No description provided.