[PR #593] [CLOSED] Replace Link with Snapshot (WIP) #1240

Closed
opened 2026-03-01 14:48:59 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/593
Author: @cdvv7788
Created: 12/29/2020
Status: Closed

Base: v0.5.0Head: link-removal


📝 Commits (10+)

  • db446dd refactor: Initial and dirty refactor to replace link with snapshot. Barely functional add command
  • 060ba91 refactor: Get archivebox init to run
  • 7c6f6a1 refactor: Get archivebox init to run
  • ae72bff refactor: wget uses snapshot instead of link
  • 4ed90f3 refactor: singlefile uses snapshot instead of link
  • a6db50d refactor: screenshot uses snapshot instead of link
  • d6671ff refactor: readability uses snapshot instead of link
  • fedba5d refactor: pdf uses snapshot instead of link
  • a94a29e refactor: mercury uses snapshot instead of link
  • 9a1e85e refactor: media uses snapshot instead of link

📊 Changes

41 files changed (+766 additions, -640 deletions)

View changed files

📝 archivebox/core/admin.py (+1 -1)
archivebox/core/migrations/0008_auto_20201228_1718.py (+18 -0)
📝 archivebox/core/models.py (+129 -14)
📝 archivebox/core/views.py (+2 -1)
📝 archivebox/extractors/__init__.py (+36 -46)
📝 archivebox/extractors/archive_org.py (+8 -6)
📝 archivebox/extractors/dom.py (+8 -6)
📝 archivebox/extractors/favicon.py (+7 -5)
📝 archivebox/extractors/git.py (+10 -8)
📝 archivebox/extractors/headers.py (+9 -7)
📝 archivebox/extractors/media.py (+8 -6)
📝 archivebox/extractors/mercury.py (+10 -8)
📝 archivebox/extractors/pdf.py (+8 -6)
📝 archivebox/extractors/readability.py (+13 -11)
📝 archivebox/extractors/screenshot.py (+8 -6)
📝 archivebox/extractors/singlefile.py (+9 -7)
📝 archivebox/extractors/title.py (+13 -10)
📝 archivebox/extractors/wget.py (+16 -14)
📝 archivebox/index/__init__.py (+135 -170)
📝 archivebox/index/csv.py (+5 -3)

...and 21 more files

📄 Description

Summary

Use Snapshot everywhere, instead of the old Link schema. This is a work in progress, and most likely unstable at this point.

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/593 **Author:** [@cdvv7788](https://github.com/cdvv7788) **Created:** 12/29/2020 **Status:** ❌ Closed **Base:** `v0.5.0` ← **Head:** `link-removal` --- ### 📝 Commits (10+) - [`db446dd`](https://github.com/ArchiveBox/ArchiveBox/commit/db446dd8b5d3106a18eb3ba35726d17cce072b20) refactor: Initial and dirty refactor to replace link with snapshot. Barely functional add command - [`060ba91`](https://github.com/ArchiveBox/ArchiveBox/commit/060ba9197bea068667b7fc63d2fa091d7301de1a) refactor: Get archivebox init to run - [`7c6f6a1`](https://github.com/ArchiveBox/ArchiveBox/commit/7c6f6a1c1fdee631cbb3d752ca0732a6080a5cdb) refactor: Get archivebox init to run - [`ae72bff`](https://github.com/ArchiveBox/ArchiveBox/commit/ae72bffb10ebc0bd210132823c96dee9d6ff24a9) refactor: wget uses snapshot instead of link - [`4ed90f3`](https://github.com/ArchiveBox/ArchiveBox/commit/4ed90f3300a72ee64cf0973fdf6d761c5b4be9f5) refactor: singlefile uses snapshot instead of link - [`a6db50d`](https://github.com/ArchiveBox/ArchiveBox/commit/a6db50dd9549464cfc11f988fa1479806e6a416d) refactor: screenshot uses snapshot instead of link - [`d6671ff`](https://github.com/ArchiveBox/ArchiveBox/commit/d6671ff4bd792fbdbf72c6f335afcf46f2731e2c) refactor: readability uses snapshot instead of link - [`fedba5d`](https://github.com/ArchiveBox/ArchiveBox/commit/fedba5d97ec0b3680bb11766fd24402280261f2d) refactor: pdf uses snapshot instead of link - [`a94a29e`](https://github.com/ArchiveBox/ArchiveBox/commit/a94a29e7eaeb2c0e53123b6c55603966baaae873) refactor: mercury uses snapshot instead of link - [`9a1e85e`](https://github.com/ArchiveBox/ArchiveBox/commit/9a1e85ee5cabc58b8d30a9ffa03678e644ade6ad) refactor: media uses snapshot instead of link ### 📊 Changes **41 files changed** (+766 additions, -640 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/core/admin.py` (+1 -1) ➕ `archivebox/core/migrations/0008_auto_20201228_1718.py` (+18 -0) 📝 `archivebox/core/models.py` (+129 -14) 📝 `archivebox/core/views.py` (+2 -1) 📝 `archivebox/extractors/__init__.py` (+36 -46) 📝 `archivebox/extractors/archive_org.py` (+8 -6) 📝 `archivebox/extractors/dom.py` (+8 -6) 📝 `archivebox/extractors/favicon.py` (+7 -5) 📝 `archivebox/extractors/git.py` (+10 -8) 📝 `archivebox/extractors/headers.py` (+9 -7) 📝 `archivebox/extractors/media.py` (+8 -6) 📝 `archivebox/extractors/mercury.py` (+10 -8) 📝 `archivebox/extractors/pdf.py` (+8 -6) 📝 `archivebox/extractors/readability.py` (+13 -11) 📝 `archivebox/extractors/screenshot.py` (+8 -6) 📝 `archivebox/extractors/singlefile.py` (+9 -7) 📝 `archivebox/extractors/title.py` (+13 -10) 📝 `archivebox/extractors/wget.py` (+16 -14) 📝 `archivebox/index/__init__.py` (+135 -170) 📝 `archivebox/index/csv.py` (+5 -3) _...and 21 more files_ </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary Use Snapshot everywhere, instead of the old Link schema. This is a work in progress, and most likely unstable at this point. <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [X] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 14:48:59 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1240
No description provided.