[GH-ISSUE #816] Question: Change url of snapshot via CLI #509

Closed
opened 2026-03-01 14:44:12 +03:00 by kerem · 7 comments
Owner

Originally created by @TheAnachronism on GitHub (Aug 1, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/816

In the admin page of a snapshot in the Web UI one can change the url this snapshot points to. I'm currently trying to automate some things and I need to do that from the CLI.
I guess I have to do something with the django shell but I'm very inexperienced in python, so I'm quite lost here.

Originally created by @TheAnachronism on GitHub (Aug 1, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/816 In the admin page of a snapshot in the Web UI one can change the url this snapshot points to. I'm currently trying to automate some things and I need to do that from the CLI. I guess I have to do something with the django shell but I'm very inexperienced in python, so I'm quite lost here.
kerem closed this issue 2026-03-01 14:44:12 +03:00
Author
Owner

@Jonirico11 commented on GitHub (Aug 1, 2021):

What are like pyton

Pada tanggal Sen, 2 Agu 2021 00:21, TheAnachronism @.***>
menulis:

In the admin page of a snapshot in the Web UI one can change the url this
snapshot points to. I'm currently trying to automate some things and I need
to do that from the CLI.
I guess I have to do something with the django shell but I'm very
inexperienced in python, so I'm quite lost here.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/ArchiveBox/ArchiveBox/issues/816, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AU4DH7YDU65RLSKSVKDHYWDT2VYCDANCNFSM5BLJEXTA
.

<!-- gh-comment-id:890549092 --> @Jonirico11 commented on GitHub (Aug 1, 2021): What are like pyton Pada tanggal Sen, 2 Agu 2021 00:21, TheAnachronism ***@***.***> menulis: > In the admin page of a snapshot in the Web UI one can change the url this > snapshot points to. I'm currently trying to automate some things and I need > to do that from the CLI. > I guess I have to do something with the django shell but I'm very > inexperienced in python, so I'm quite lost here. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/ArchiveBox/ArchiveBox/issues/816>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AU4DH7YDU65RLSKSVKDHYWDT2VYCDANCNFSM5BLJEXTA> > . >
Author
Owner

@pirate commented on GitHub (Aug 1, 2021):

Hmmm why are you doing that? It don't think it should even be possible in the UI... That field on Snapshot is supposed to be immutable.

<!-- gh-comment-id:890583702 --> @pirate commented on GitHub (Aug 1, 2021): Hmmm why are you doing that? It don't think it should even be possible in the UI... That field on Snapshot is supposed to be immutable.
Author
Owner

@TheAnachronism commented on GitHub (Aug 1, 2021):

Well I'm archiving stuff from a restricted website which makes a huge mess if I try to do that with the custom browser settings (because I have archivebox running as a docker container on an ubuntu server).
So my current workaround is to save the page with the singlefile browser extension, host that html on a temporary webserver and archive that webserver url. Then replace the url for that snapshot in the webui.
As you can see its a bit of work to do that, so I automated the entire process from uploading it to the web server to adding it to archivebox. Now I just want to replace the temporary webserver url with the original restricted website url.

image

In the webui on the admin page of a snapshot you can change this here.

Previously I wanted to avoid the entire temp webserver thing by trying to import those downloaded html files. But because there currently is no actual import workflow and the browser setup was a bit too much for me, I went with this weird solution.

<!-- gh-comment-id:890585479 --> @TheAnachronism commented on GitHub (Aug 1, 2021): Well I'm archiving stuff from a restricted website which makes a huge mess if I try to do that with the custom browser settings (because I have archivebox running as a docker container on an ubuntu server). So my current workaround is to save the page with the [singlefile](https://github.com/gildas-lormeau/SingleFile) browser extension, host that html on a temporary webserver and archive that webserver url. Then replace the url for that snapshot in the webui. As you can see its a bit of work to do that, so I automated the entire process from uploading it to the web server to adding it to archivebox. Now I just want to replace the temporary webserver url with the original restricted website url. ![image](https://user-images.githubusercontent.com/32616088/127785092-d78941a4-5e00-44ae-9765-302af7bff479.png) In the webui on the admin page of a snapshot you can change this here. Previously I wanted to avoid the entire temp webserver thing by trying to import those downloaded html files. But because there currently is no actual import workflow and the browser setup was a bit too much for me, I went with this weird solution.
Author
Owner

@pirate commented on GitHub (Aug 1, 2021):

Hmm I would recommend archiving the original URLs to create the Snapshot entries, letting them fail, then just replacing the singlefile.html entries in the filesystem with your original singlefiles, instead of trying to re-archive the singlefile html through your rehosting server.

I'm not really keen on allowing people to edit Snapshot URLs, as I wanted those to be immutable originally. I don't think it's good for archive integrity to let people change the URLs easily, as they're used as unique primary keys in a lot of places and the system expects them to never change.

<!-- gh-comment-id:890588588 --> @pirate commented on GitHub (Aug 1, 2021): Hmm I would recommend archiving the original URLs to create the Snapshot entries, letting them fail, then just replacing the `singlefile.html` entries in the filesystem with your original singlefiles, instead of trying to re-archive the singlefile html through your rehosting server. I'm not really keen on allowing people to edit Snapshot URLs, as I wanted those to be immutable originally. I don't think it's good for archive integrity to let people change the URLs easily, as they're used as unique primary keys in a lot of places and the system expects them to never change.
Author
Owner

@TheAnachronism commented on GitHub (Aug 2, 2021):

Mhh ok
The only problem now is, that I'd need the filepath, where that new "failed" archive is created, so that I can replace the files with my manual versions.
archivebox add ... has a huge output...

<!-- gh-comment-id:890795597 --> @TheAnachronism commented on GitHub (Aug 2, 2021): Mhh ok The only problem now is, that I'd need the filepath, where that new "failed" archive is created, so that I can replace the files with my manual versions. `archivebox add ...` has a huge output...
Author
Owner

@pirate commented on GitHub (Aug 2, 2021):

The path is just the timestamp of the snapshot ./archive/<timestamp>/singlefile.html, if you have the url you can get the timestamp easily with archivebox list --csv=timestamp <url here> or by looking in the index.sqlite3 file directly.

<!-- gh-comment-id:890883155 --> @pirate commented on GitHub (Aug 2, 2021): The path is just the timestamp of the snapshot `./archive/<timestamp>/singlefile.html`, if you have the url you can get the timestamp easily with `archivebox list --csv=timestamp <url here>` or by looking in the `index.sqlite3` file directly.
Author
Owner

@TheAnachronism commented on GitHub (Aug 2, 2021):

I'll try that with archivebox list --csv=timestamp <url here>

<!-- gh-comment-id:890889489 --> @TheAnachronism commented on GitHub (Aug 2, 2021): I'll try that with `archivebox list --csv=timestamp <url here>`
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#509
No description provided.