[GH-ISSUE #650] Feature Request: Lightweight rg configuration #406

Open
opened 2026-03-01 14:43:17 +03:00 by kerem · 0 comments
Owner

Originally created by @berezovskyi on GitHub (Feb 6, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/650

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I have about 5000 items in the library, ripgrep FTS on them takes around 75s when run from the host CLI, may be a bit longer on Docker. It times out after 60 seconds as per a hardcoded constant in https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/search/backends/ripgrep.py#L35

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

When I run rg --type-add 'archiveb:*.{html,txt,json}' -iltarchiveb -e 'query' archive/ instead of the original command, my search terminates in 25s.

What hacks or alternative solutions have you tried to solve the problem?

We can increase the delay or make it configurable.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @berezovskyi on GitHub (Feb 6, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/650 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> ## Type - [ ] General question or discussion - [ ] Propose a brand new feature - [x] Request modification of existing behavior or design ## What is the problem that your feature request solves <!-- e.g. I need to be able to archive spanish and french subtitle files from a particular <example.com> movie site that's going down soon. --> I have about 5000 items in the library, ripgrep FTS on them takes around 75s when run from the host CLI, may be a bit longer on Docker. It times out after 60 seconds as per a hardcoded constant in https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/search/backends/ripgrep.py#L35 ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes <!-- e.g. I specifically need a new archive method to look for multilingual subtitle files related to pages. The bigger picture solution is the ability for custom user scripts to be run in a puppeteer context during archiving. --> When I run `rg --type-add 'archiveb:*.{html,txt,json}' -iltarchiveb -e 'query' archive/` instead of the original command, my search terminates in 25s. ## What hacks or alternative solutions have you tried to solve the problem? <!-- A clear and concise description of any alternative solutions, workarounds, or other software you've considered using to fix the problem. --> We can increase the delay or make it configurable. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [x] It's important to add it in the near-mid term future - [ ] It would be nice to have eventually --- - [x] I'm willing to contribute [dev time](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) to fix this issue - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#406
No description provided.