[PR #543] [CLOSED] Full-text search #4229

Closed
opened 2026-03-15 01:33:22 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/543
Author: @jdcaballerov
Created: 11/19/2020
Status: Closed

Base: v0.5.0Head: sonic-search


📝 Commits (10+)

  • 379b4b0 Initial implementation
  • 5f38f14 Implement backend architecture for search engines
  • f74f1d8 Add config for search backend
  • f5fbb67 Implement flush for search backend after remove command
  • b33db1d Use a generator for snapshot flush from index
  • 05ace2b Use QuerySets for search backend API instead of pks
  • a3c4c72 fix: Return empty QuerySet instead of list
  • d02bbfa feat: add search filter-type to list command
  • 3f3f87e Merge branch 'master' into sonic-search
  • 4e4d3e2 Add sonic to docker-compose

📊 Changes

24 files changed (+577 additions, -47 deletions)

View changed files

📝 Dockerfile (+1 -1)
📝 archivebox.egg-info/requires.txt (+2 -0)
📝 archivebox/cli/archivebox_list.py (+1 -1)
📝 archivebox/cli/archivebox_update.py (+1 -1)
📝 archivebox/config.py (+13 -1)
📝 archivebox/core/admin.py (+2 -1)
archivebox/core/migrations/0007_archiveresult.py (+91 -0)
archivebox/core/mixins.py (+23 -0)
📝 archivebox/core/models.py (+31 -2)
📝 archivebox/core/utils.py (+47 -30)
📝 archivebox/extractors/__init__.py (+18 -0)
📝 archivebox/extractors/readability.py (+5 -2)
📝 archivebox/index/__init__.py (+28 -1)
📝 archivebox/index/schema.py (+1 -0)
📝 archivebox/main.py (+3 -0)
archivebox/search/__init__.py (+110 -0)
archivebox/search/backends/__init__.py (+0 -0)
archivebox/search/backends/ripgrep.py (+47 -0)
archivebox/search/backends/sonic.py (+28 -0)
archivebox/search/utils.py (+44 -0)

...and 4 more files

📄 Description

Summary

This PR Adds the ability to do full-text search 🎉

Related issues

#22 #24

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/543 **Author:** [@jdcaballerov](https://github.com/jdcaballerov) **Created:** 11/19/2020 **Status:** ❌ Closed **Base:** `v0.5.0` ← **Head:** `sonic-search` --- ### 📝 Commits (10+) - [`379b4b0`](https://github.com/ArchiveBox/ArchiveBox/commit/379b4b04bb5182192ca738867428adba83bf2c6b) Initial implementation - [`5f38f14`](https://github.com/ArchiveBox/ArchiveBox/commit/5f38f1430a56e70d0dda957218ea373aaa7775c5) Implement backend architecture for search engines - [`f74f1d8`](https://github.com/ArchiveBox/ArchiveBox/commit/f74f1d83a15ad452d3be862cd550a60105b51621) Add config for search backend - [`f5fbb67`](https://github.com/ArchiveBox/ArchiveBox/commit/f5fbb673b9ffd99f84cd19156dabc1fbcee1ccd4) Implement flush for search backend after remove command - [`b33db1d`](https://github.com/ArchiveBox/ArchiveBox/commit/b33db1dd1f57658bf07cb6292bb4db99a740f55a) Use a generator for snapshot flush from index - [`05ace2b`](https://github.com/ArchiveBox/ArchiveBox/commit/05ace2b7db5ffcde469b2d75dbcff3ca58e99edd) Use QuerySets for search backend API instead of pks - [`a3c4c72`](https://github.com/ArchiveBox/ArchiveBox/commit/a3c4c721fbbb3e3ffe5bc4dd9636175c0e03f0ee) fix: Return empty QuerySet instead of list - [`d02bbfa`](https://github.com/ArchiveBox/ArchiveBox/commit/d02bbfa2b988af243d26be669519ed582a88a9e7) feat: add search filter-type to list command - [`3f3f87e`](https://github.com/ArchiveBox/ArchiveBox/commit/3f3f87e7aa154f4243f5d0734cf1ffa7b48c9169) Merge branch 'master' into sonic-search - [`4e4d3e2`](https://github.com/ArchiveBox/ArchiveBox/commit/4e4d3e2619e265822c193135c0efdf4b221ed8ca) Add sonic to docker-compose ### 📊 Changes **24 files changed** (+577 additions, -47 deletions) <details> <summary>View changed files</summary> 📝 `Dockerfile` (+1 -1) 📝 `archivebox.egg-info/requires.txt` (+2 -0) 📝 `archivebox/cli/archivebox_list.py` (+1 -1) 📝 `archivebox/cli/archivebox_update.py` (+1 -1) 📝 `archivebox/config.py` (+13 -1) 📝 `archivebox/core/admin.py` (+2 -1) ➕ `archivebox/core/migrations/0007_archiveresult.py` (+91 -0) ➕ `archivebox/core/mixins.py` (+23 -0) 📝 `archivebox/core/models.py` (+31 -2) 📝 `archivebox/core/utils.py` (+47 -30) 📝 `archivebox/extractors/__init__.py` (+18 -0) 📝 `archivebox/extractors/readability.py` (+5 -2) 📝 `archivebox/index/__init__.py` (+28 -1) 📝 `archivebox/index/schema.py` (+1 -0) 📝 `archivebox/main.py` (+3 -0) ➕ `archivebox/search/__init__.py` (+110 -0) ➕ `archivebox/search/backends/__init__.py` (+0 -0) ➕ `archivebox/search/backends/ripgrep.py` (+47 -0) ➕ `archivebox/search/backends/sonic.py` (+28 -0) ➕ `archivebox/search/utils.py` (+44 -0) _...and 4 more files_ </details> ### 📄 Description # Summary This PR Adds the ability to do full-text search 🎉 # Related issues #22 #24 <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [x] Feature behavior - [ ] Command line interface - [ ] Configuration options - [x] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:33:22 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4229
No description provided.