[GH-ISSUE #764] Question: I accidentally used a depth setting of one and didn't choose an archive type. I ended up getting about 747 archive results in my log. What should I do? #3505

Closed
opened 2026-03-14 23:18:04 +03:00 by kerem · 5 comments
Owner

Originally created by @NylaTheWolf on GitHub (Jun 6, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/764

So I was an idiot and decided not to choose an archive type that I wanted to upload. I even set the depth to 1. I canceled the process when I realized what was happening after I got back from a shower, and I now have 1.01 gb of files on my computer (some of the processes have failed, and maybe some weren't even fully downloaded). There's a part of me that wants to delete the extraneous files (like favicons, headers, wgets(?) etc), but I wonder if maybe it'd be a good idea to back it up, or even upload everything to archive.org. I also know that I could output things into a json or html list, and I'm guessing it gives all the links of what I archived?

I know there is a delete command, but I was wondering if I could delete certain types of files or something?

Originally created by @NylaTheWolf on GitHub (Jun 6, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/764 So I was an idiot and decided not to choose an archive type that I wanted to upload. I even set the depth to 1. I canceled the process when I realized what was happening after I got back from a shower, and I now have 1.01 gb of files on my computer (some of the processes have failed, and maybe some weren't even fully downloaded). There's a part of me that wants to delete the extraneous files (like favicons, headers, wgets(?) etc), but I wonder if maybe it'd be a good idea to back it up, or even upload everything to archive.org. I also know that I could output things into a json or html list, and I'm guessing it gives all the links of what I archived? I know there is a delete command, but I was wondering if I could delete certain types of files or something?
kerem closed this issue 2026-03-14 23:18:10 +03:00
Author
Owner

@pirate commented on GitHub (Jun 7, 2021):

You can delete any specified files you want from the Log page, or entire snapshots from the Snapshot page, or use the archivebox remove command from the CLI to do the same thing. Up to you if you want to back it up or not, if you didn't change the default settings then it also saved everything to Archive.org.

Using all the archive types is good, you should generally continue doing that, just be careful what you use depth=1 for as it will grow quickly on pages with many URLs within.

<!-- gh-comment-id:855594581 --> @pirate commented on GitHub (Jun 7, 2021): You can delete any specified files you want from the `Log` page, or entire snapshots from the `Snapshot` page, or use the `archivebox remove` command from the CLI to do the same thing. Up to you if you want to back it up or not, if you didn't change the default settings then it also saved everything to Archive.org. Using all the archive types is good, you should generally continue doing that, just be careful what you use depth=1 for as it will grow quickly on pages with many URLs within.
Author
Owner

@NylaTheWolf commented on GitHub (Jun 11, 2021):

You can delete any specified files you want from the Log page, or entire snapshots from the Snapshot page, or use the archivebox remove command from the CLI to do the same thing. Up to you if you want to back it up or not, if you didn't change the default settings then it also saved everything to Archive.org.

Using all the archive types is good, you should generally continue doing that, just be careful what you use depth=1 for as it will grow quickly on pages with many URLs within.

Alright! I'll try that out when I have the time!

Would it be risky for me to upload the archived webpages to archive.org? Would I be compromising any personal information?

<!-- gh-comment-id:859199461 --> @NylaTheWolf commented on GitHub (Jun 11, 2021): > You can delete any specified files you want from the `Log` page, or entire snapshots from the `Snapshot` page, or use the `archivebox remove` command from the CLI to do the same thing. Up to you if you want to back it up or not, if you didn't change the default settings then it also saved everything to Archive.org. > > Using all the archive types is good, you should generally continue doing that, just be careful what you use depth=1 for as it will grow quickly on pages with many URLs within. Alright! I'll try that out when I have the time! Would it be risky for me to upload the archived webpages to archive.org? Would I be compromising any personal information?
Author
Owner

@pirate commented on GitHub (Jun 11, 2021):

They are already saved to Archive.org by ArchiveBox, no need to upload them again.

<!-- gh-comment-id:859200993 --> @pirate commented on GitHub (Jun 11, 2021): They are already saved to Archive.org by ArchiveBox, no need to upload them again.
Author
Owner

@NylaTheWolf commented on GitHub (Jun 11, 2021):

They are already saved to Archive.org by ArchiveBox, no need to upload them again.

Oh okay! I was just wondering if I should make sure for the other kinds of files that were saved aha

<!-- gh-comment-id:859201647 --> @NylaTheWolf commented on GitHub (Jun 11, 2021): > They are already saved to Archive.org by ArchiveBox, no need to upload them again. Oh okay! I was just wondering if I should make sure for the other kinds of files that were saved aha
Author
Owner

@pirate commented on GitHub (Jun 11, 2021):

Not sure Archive.org will take them, as they already use ways to save most of the file types we save, except for maybe PDF and screenshot copies for redundancy.

<!-- gh-comment-id:859205293 --> @pirate commented on GitHub (Jun 11, 2021): Not sure Archive.org will take them, as they already use ways to save most of the file types we save, except for maybe PDF and screenshot copies for redundancy.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3505
No description provided.