[PR #525] [MERGED] Move ArchiveResult from detail index.json history to database model #1208

Closed
opened 2026-03-01 14:48:51 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/525
Author: @cdvv7788
Created: 11/3/2020
Status: Merged
Merged: 11/28/2020
Merged by: @pirate

Base: masterHead: archive-result


📝 Commits (10+)

  • 8f3c03a feat: Initial (and naive) ArchiveResult model
  • 309a87e feat: Add extractor field to the database
  • b3e0400 feat: initial functional version with icons calculated based on archive results
  • 4484491 feat: Create ArchiveResult after finishing an extractor process
  • f292cfa fix: Add condition for oneshot when archiving links
  • d064a3e fix: Handle case when update tries to re-add a link that is not in the sql index
  • 33182fd fix: Add missing assignation
  • 7165522 feat: Add warc to list and limit check to succeeded archive results
  • 508a0bb refactor: Unpack extractors tuple instead of using the index to access the relevant information
  • f7f0beb feat: Modify migration reverse function to restore index (WIP)

📊 Changes

9 files changed (+180 additions, -40 deletions)

View changed files

📝 archivebox.egg-info/SOURCES.txt (+1 -0)
📝 archivebox.egg-info/requires.txt (+1 -1)
archivebox/core/migrations/0007_archiveresult.py (+91 -0)
📝 archivebox/core/models.py (+23 -0)
📝 archivebox/core/utils.py (+45 -30)
📝 archivebox/extractors/__init__.py (+13 -0)
📝 archivebox/themes/default/base.html (+4 -6)
📝 setup.py (+1 -2)
📝 tests/test_update.py (+1 -1)

📄 Description

Summary

When this PR is ready, archivebox will be able to:

  • Save ArchiveResults to the database
  • Load ArchiveResults from the filesystem if nothing is found in the database (to ease migration)
  • Use ArchiveResults to answer questions about available extractors output per snapshot

Related issues

https://github.com/pirate/ArchiveBox/issues/513

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/525 **Author:** [@cdvv7788](https://github.com/cdvv7788) **Created:** 11/3/2020 **Status:** ✅ Merged **Merged:** 11/28/2020 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `master` ← **Head:** `archive-result` --- ### 📝 Commits (10+) - [`8f3c03a`](https://github.com/ArchiveBox/ArchiveBox/commit/8f3c03a0f9f79a88842afcb73d41adb6004cfb2d) feat: Initial (and naive) ArchiveResult model - [`309a87e`](https://github.com/ArchiveBox/ArchiveBox/commit/309a87e8fecdcd291d64d66add47c46d766dd9e0) feat: Add extractor field to the database - [`b3e0400`](https://github.com/ArchiveBox/ArchiveBox/commit/b3e0400bc0b0b24891a63ded515526b0dba38420) feat: initial functional version with icons calculated based on archive results - [`4484491`](https://github.com/ArchiveBox/ArchiveBox/commit/4484491fb77aeafe116aa5226d4c0cfd12e5de61) feat: Create ArchiveResult after finishing an extractor process - [`f292cfa`](https://github.com/ArchiveBox/ArchiveBox/commit/f292cface27e6de0a552d2fc1e78fd99f6aa9219) fix: Add condition for oneshot when archiving links - [`d064a3e`](https://github.com/ArchiveBox/ArchiveBox/commit/d064a3eeffa0a6cb52462ce1f2edb0d6be8f753a) fix: Handle case when update tries to re-add a link that is not in the sql index - [`33182fd`](https://github.com/ArchiveBox/ArchiveBox/commit/33182fd53c0d96f46576ee38551a7ac4a50ee534) fix: Add missing assignation - [`7165522`](https://github.com/ArchiveBox/ArchiveBox/commit/71655220ad8554458978a078e604cb2b57fa2e1c) feat: Add warc to list and limit check to succeeded archive results - [`508a0bb`](https://github.com/ArchiveBox/ArchiveBox/commit/508a0bb06ebd15bcb63407328a5d4747fb10d977) refactor: Unpack extractors tuple instead of using the index to access the relevant information - [`f7f0beb`](https://github.com/ArchiveBox/ArchiveBox/commit/f7f0bebdcc021623a438e7975982523cdbe8bea8) feat: Modify migration reverse function to restore index (WIP) ### 📊 Changes **9 files changed** (+180 additions, -40 deletions) <details> <summary>View changed files</summary> 📝 `archivebox.egg-info/SOURCES.txt` (+1 -0) 📝 `archivebox.egg-info/requires.txt` (+1 -1) ➕ `archivebox/core/migrations/0007_archiveresult.py` (+91 -0) 📝 `archivebox/core/models.py` (+23 -0) 📝 `archivebox/core/utils.py` (+45 -30) 📝 `archivebox/extractors/__init__.py` (+13 -0) 📝 `archivebox/themes/default/base.html` (+4 -6) 📝 `setup.py` (+1 -2) 📝 `tests/test_update.py` (+1 -1) </details> ### 📄 Description # Summary When this PR is ready, archivebox will be able to: - Save ArchiveResults to the database - Load ArchiveResults from the filesystem if nothing is found in the database (to ease migration) - Use ArchiveResults to answer questions about available extractors output per snapshot # Related issues https://github.com/pirate/ArchiveBox/issues/513 # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [X] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 14:48:51 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1208
No description provided.