[GH-ISSUE #342] Question: ...How to merge multiple Archive collections #248

Closed
opened 2026-03-01 14:41:49 +03:00 by kerem · 1 comment
Owner

Originally created by @ekiel on GitHub (Apr 29, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/342

Right now I’m using ArchiveBox for multiple collections on different machines and the contents are not the same- is it possible to merge the ArchiveBox outputs without having to re-scrape everything?

Originally created by @ekiel on GitHub (Apr 29, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/342 Right now I’m using ArchiveBox for multiple collections on different machines and the contents are not the same- is it possible to merge the ArchiveBox outputs without having to re-scrape everything?
kerem closed this issue 2026-03-01 14:41:49 +03:00
Author
Owner

@pirate commented on GitHub (Apr 29, 2020):

Yup, this is supported natively with the new archivebox init feature in >=v0.4. You can drag all the timestamp folders from one archive folder to the other, run init, and it will import them all, effectively merging the two archives.

The new version also doesn't require keeping the output folder in the same folder as the code, instead you pip install the archivebox command system-wide (or in a virtualenv), and then you can run archivebox init in any folder to use it as a data folder.

You can install from the django branch to try out the early pre-release version, or subscribe to this PR to get notified of when it actually gets merged:
https://github.com/pirate/ArchiveBox/pull/207 (it may take a while, I rarely have coding time to dedicate to this project these days).

<!-- gh-comment-id:621456778 --> @pirate commented on GitHub (Apr 29, 2020): Yup, this is supported natively with the new `archivebox init` feature in `>=v0.4`. You can drag all the timestamp folders from one archive folder to the other, run `init`, and it will import them all, effectively merging the two archives. The new version also doesn't require keeping the output folder in the same folder as the code, instead you `pip` install the `archivebox` command system-wide (or in a virtualenv), and then you can run `archivebox init` in any folder to use it as a data folder. You can install from the `django` branch to try out the early pre-release version, or subscribe to this PR to get notified of when it actually gets merged: https://github.com/pirate/ArchiveBox/pull/207 (it may take a while, I rarely have coding time to dedicate to this project these days).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#248
No description provided.