mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #816] Question: Change url of snapshot via CLI #509
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#509
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TheAnachronism on GitHub (Aug 1, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/816
In the admin page of a snapshot in the Web UI one can change the url this snapshot points to. I'm currently trying to automate some things and I need to do that from the CLI.
I guess I have to do something with the django shell but I'm very inexperienced in python, so I'm quite lost here.
@Jonirico11 commented on GitHub (Aug 1, 2021):
What are like pyton
Pada tanggal Sen, 2 Agu 2021 00:21, TheAnachronism @.***>
menulis:
@pirate commented on GitHub (Aug 1, 2021):
Hmmm why are you doing that? It don't think it should even be possible in the UI... That field on Snapshot is supposed to be immutable.
@TheAnachronism commented on GitHub (Aug 1, 2021):
Well I'm archiving stuff from a restricted website which makes a huge mess if I try to do that with the custom browser settings (because I have archivebox running as a docker container on an ubuntu server).
So my current workaround is to save the page with the singlefile browser extension, host that html on a temporary webserver and archive that webserver url. Then replace the url for that snapshot in the webui.
As you can see its a bit of work to do that, so I automated the entire process from uploading it to the web server to adding it to archivebox. Now I just want to replace the temporary webserver url with the original restricted website url.
In the webui on the admin page of a snapshot you can change this here.
Previously I wanted to avoid the entire temp webserver thing by trying to import those downloaded html files. But because there currently is no actual import workflow and the browser setup was a bit too much for me, I went with this weird solution.
@pirate commented on GitHub (Aug 1, 2021):
Hmm I would recommend archiving the original URLs to create the Snapshot entries, letting them fail, then just replacing the
singlefile.htmlentries in the filesystem with your original singlefiles, instead of trying to re-archive the singlefile html through your rehosting server.I'm not really keen on allowing people to edit Snapshot URLs, as I wanted those to be immutable originally. I don't think it's good for archive integrity to let people change the URLs easily, as they're used as unique primary keys in a lot of places and the system expects them to never change.
@TheAnachronism commented on GitHub (Aug 2, 2021):
Mhh ok
The only problem now is, that I'd need the filepath, where that new "failed" archive is created, so that I can replace the files with my manual versions.
archivebox add ...has a huge output...@pirate commented on GitHub (Aug 2, 2021):
The path is just the timestamp of the snapshot
./archive/<timestamp>/singlefile.html, if you have the url you can get the timestamp easily witharchivebox list --csv=timestamp <url here>or by looking in theindex.sqlite3file directly.@TheAnachronism commented on GitHub (Aug 2, 2021):
I'll try that with
archivebox list --csv=timestamp <url here>