[PR #389] [MERGED] fix: Guess timestamps and add placeholders to support older indices #4159

Closed
opened 2026-03-15 01:29:36 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/389
Author: @cdvv7788
Created: 7/24/2020
Status: Merged
Merged: 7/24/2020
Merged by: @pirate

Base: djangoHead: recover-index


📝 Commits (2)

  • 100fa5d fix: Guess timestamps and add placeholders to support older indices
  • 82f8f8b fix: Use config information for path instead of hardcoded values

📊 Changes

3 files changed (+51 additions, -16 deletions)

View changed files

📝 archivebox/index/__init__.py (+12 -4)
📝 archivebox/index/json.py (+8 -6)
📝 archivebox/index/schema.py (+31 -6)

📄 Description

Summary

Index loading is more liberal now. Instead of failing, it will attempt to repair the found files by extracting the timestamp from the duration and timestamp fields. Other values are just marked as undefined.
This should not affect up to date indexes. Old indexes, however, should get rewritten with the new information.

**Related issues: #374

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Archived data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/389 **Author:** [@cdvv7788](https://github.com/cdvv7788) **Created:** 7/24/2020 **Status:** ✅ Merged **Merged:** 7/24/2020 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `django` ← **Head:** `recover-index` --- ### 📝 Commits (2) - [`100fa5d`](https://github.com/ArchiveBox/ArchiveBox/commit/100fa5d1f551bd285cbd0aeaa2949a76673d6d2a) fix: Guess timestamps and add placeholders to support older indices - [`82f8f8b`](https://github.com/ArchiveBox/ArchiveBox/commit/82f8f8b661aab3badcab258c03b8081a34608558) fix: Use config information for path instead of hardcoded values ### 📊 Changes **3 files changed** (+51 additions, -16 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/index/__init__.py` (+12 -4) 📝 `archivebox/index/json.py` (+8 -6) 📝 `archivebox/index/schema.py` (+31 -6) </details> ### 📄 Description # Summary Index loading is more liberal now. Instead of failing, it will attempt to repair the found files by extracting the timestamp from the `duration` and `timestamp` fields. Other values are just marked as `undefined`. This should not affect up to date indexes. Old indexes, however, should get rewritten with the new information. **Related issues: #374 # Changes these areas - [X] Bugfixes - [X] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Archived data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:29:36 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4159
No description provided.