[PR #452] [MERGED] Replace index.json with index.sql as the main index #2681

Closed
opened 2026-03-01 18:00:24 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/452
Author: @cdvv7788
Created: 8/18/2020
Status: Merged
Merged: 9/15/2020
Merged by: @cdvv7788

Base: masterHead: sql_index


📝 Commits (10+)

  • ecf7476 feat: Replace index.json with index.sql as the main index in init
  • 11c3e69 feat: Update status command to consider sql as the main index
  • b94e651 feat: Update extractors and add command to use sql index as source of truth
  • f028999 feat: Remove patch_main_index
  • 2d05462 feat: Update data folder check
  • 58d79b9 feat: Save static indexes at the end of init
  • bb8bbe1 feat: Add flag to list command to support index like output
  • 3ba5ad1 feat: Add html export to list command
  • 1509fb6 feat: list command fails when --index is used without --json or --html
  • 97215f6 feat: load_main_index returns a queryset now

📊 Changes

17 files changed (+629 additions, -355 deletions)

View changed files

📝 archivebox/cli/archivebox_list.py (+20 -1)
📝 archivebox/cli/archivebox_oneshot.py (+1 -1)
📝 archivebox/config/__init__.py (+2 -3)
📝 archivebox/core/admin.py (+3 -3)
📝 archivebox/core/models.py (+4 -0)
📝 archivebox/extractors/__init__.py (+21 -19)
📝 archivebox/index/__init__.py (+117 -158)
📝 archivebox/index/html.py (+3 -2)
📝 archivebox/index/sql.py (+22 -12)
📝 archivebox/logging_util.py (+29 -8)
📝 archivebox/main.py (+111 -144)
archivebox/themes/legacy/main_index_minimal.html (+20 -0)
📝 tests/test_add.py (+11 -0)
📝 tests/test_init.py (+68 -0)
tests/test_list.py (+67 -0)
📝 tests/test_remove.py (+103 -4)
tests/test_update.py (+27 -0)

📄 Description

Summary

After this PR is ready, the json index will not be considered the main source of truth anymore. Instead, the index.sqlite3 will replace it in that role.
The index.json will still be around, but it will only be written at the end of the processes that run. If the archive is old (no index.sqlite3 is present) running archivebox init --force will be necessary to update it to the latest version.

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

Roadmap Goals

This is one of the main goals of the 0.5 release.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/452 **Author:** [@cdvv7788](https://github.com/cdvv7788) **Created:** 8/18/2020 **Status:** ✅ Merged **Merged:** 9/15/2020 **Merged by:** [@cdvv7788](https://github.com/cdvv7788) **Base:** `master` ← **Head:** `sql_index` --- ### 📝 Commits (10+) - [`ecf7476`](https://github.com/ArchiveBox/ArchiveBox/commit/ecf74767a3196fc9be51f234a67e6cb73280a9ee) feat: Replace index.json with index.sql as the main index in init - [`11c3e69`](https://github.com/ArchiveBox/ArchiveBox/commit/11c3e69b85d14bea823f1fd54bfc2581715f0356) feat: Update status command to consider sql as the main index - [`b94e651`](https://github.com/ArchiveBox/ArchiveBox/commit/b94e6512ff553e721784d19354ebce572d3ea696) feat: Update extractors and add command to use sql index as source of truth - [`f028999`](https://github.com/ArchiveBox/ArchiveBox/commit/f0289990490a93459c3ff1c5a13d838889a7d915) feat: Remove patch_main_index - [`2d05462`](https://github.com/ArchiveBox/ArchiveBox/commit/2d05462dae5130691d157bb677b70d5615a00bdf) feat: Update data folder check - [`58d79b9`](https://github.com/ArchiveBox/ArchiveBox/commit/58d79b9476d47fade920cf5edb2ace28e145ad82) feat: Save static indexes at the end of `init` - [`bb8bbe1`](https://github.com/ArchiveBox/ArchiveBox/commit/bb8bbe12461fffb70f98ca0706384358092e47ff) feat: Add flag to list command to support index like output - [`3ba5ad1`](https://github.com/ArchiveBox/ArchiveBox/commit/3ba5ad1930c85c05fb3e27e8b6b3d70a18b20d0a) feat: Add html export to list command - [`1509fb6`](https://github.com/ArchiveBox/ArchiveBox/commit/1509fb6d7483af32dd4fa1b271c66d4353c844f2) feat: list command fails when --index is used without --json or --html - [`97215f6`](https://github.com/ArchiveBox/ArchiveBox/commit/97215f6eba313a6ab7227218ff5477816eb79cd7) feat: load_main_index returns a queryset now ### 📊 Changes **17 files changed** (+629 additions, -355 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/cli/archivebox_list.py` (+20 -1) 📝 `archivebox/cli/archivebox_oneshot.py` (+1 -1) 📝 `archivebox/config/__init__.py` (+2 -3) 📝 `archivebox/core/admin.py` (+3 -3) 📝 `archivebox/core/models.py` (+4 -0) 📝 `archivebox/extractors/__init__.py` (+21 -19) 📝 `archivebox/index/__init__.py` (+117 -158) 📝 `archivebox/index/html.py` (+3 -2) 📝 `archivebox/index/sql.py` (+22 -12) 📝 `archivebox/logging_util.py` (+29 -8) 📝 `archivebox/main.py` (+111 -144) ➕ `archivebox/themes/legacy/main_index_minimal.html` (+20 -0) 📝 `tests/test_add.py` (+11 -0) 📝 `tests/test_init.py` (+68 -0) ➕ `tests/test_list.py` (+67 -0) 📝 `tests/test_remove.py` (+103 -4) ➕ `tests/test_update.py` (+27 -0) </details> ### 📄 Description # Summary After this PR is ready, the json index will not be considered the main source of truth anymore. Instead, the index.sqlite3 will replace it in that role. The index.json will still be around, but it will only be written at the end of the processes that run. If the archive is old (no index.sqlite3 is present) running `archivebox init --force` will be necessary to update it to the latest version. # Changes these areas - [ ] Bugfixes - [X] Feature behavior - [ ] Command line interface - [ ] Configuration options - [X] Internal architecture - [X] Archived data layout on disk # Roadmap Goals This is one of the main goals of the 0.5 release. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 18:00:24 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2681
No description provided.