mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #433] Bugfix: deleted item re-appears upon next import of URLs #291
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#291
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aayio on GitHub (Aug 10, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/433
Thank you in advance for your help,
Sorry if this isn't experienced universally and it's just something I'm not doing right 😕
Describe the bug
Deleted item is re-imported upon the next import of (unrelated) URLs
Steps to reproduce
Software versions
c8e3aed@cdvv7788 commented on GitHub (Aug 10, 2020):
I was able to reproduce the bug. @mauvity for now, as a workaround, you can select the items you want to delete from the list and click the

deletebutton at the top right:I will send a PR to fix the issue soon.
@pirate commented on GitHub (Aug 10, 2020):
@cdvv7788 the timestamp > delete version will be fixed automatically once we remove the json main index
don't bother fixing it for now, it would just add a bunch of workaround complexity for a problem that's going away soon anyway.
@cdvv7788 commented on GitHub (Aug 10, 2020):
Ok. Please leave this open so we don't forget to check back once we merge the index changes.
@cdvv7788 commented on GitHub (Oct 7, 2020):
@mauvity can you please check if the current version on master fixes it? We refactored the index internals.
@pirate commented on GitHub (Oct 7, 2020):
There is still a functional difference between the two ways:
archivebox init)@cdvv7788 commented on GitHub (Oct 7, 2020):
Oh right, the delete functionality has not been touched in the refactor.
@cdvv7788 commented on GitHub (Oct 9, 2020):
@pirate what should we do about this? Maybe add a confirmation and change both methods to remove the actual files? If the admin is a way to maintain the index, leaving orphaned folders may be unnecessary.
@pirate commented on GitHub (Oct 10, 2020):
I think removing the delete button from the snapshot admin detail page is enough for now. (Leave the delete button on the list page the way it is now).
@pirate commented on GitHub (Dec 11, 2020):
@cdvv7788 is this fixed in v0.5.0? If not can we do that.
@pirate commented on GitHub (Apr 6, 2021):
I'm pretty sure this was already fixed in v0.5.6. Comment back here if you're still seeing the issue and I'll reopen the ticket.
@235 commented on GitHub (Jan 4, 2024):
The bug re-appearing in ArchiveBox version v0.7.1. Quite odd to observe new import full of deleted entries earlier.
I've just observed another bug, which could be related - a handful of deleted entries re-appeared on the top of the list with newer dates. These entries weren't indexed yet, I suspect the extractor had them already in the queue, inserting them back as it went though them.
cc: @pirate
@pirate commented on GitHub (Jan 4, 2024):
@235 Can you confirm this is happening when you delete an older completed Snapshot that does not have the same URL present in a later import?
Deleting does not prevent a URL from being re-added in the future, so if you deleted some Snapshots and then re-imported the same URLs later on, they will re-appear (as new Snapshot entries).
Deleting during an import is also totally broken/not advised. This is the downside of making all my import code immutable/indempotent (it overwrites entries entirely on changes instead of mutating them in-place). Because Snapshots are operated on in-memory, it rewrites the DB and disk entries several times from memory as it does work during the import process, and as long as it's still in-memory being operated on it doesn't notice when a user deletes the DB/disk entry out from underneath it.
@235 commented on GitHub (Jan 17, 2024):
As discussed in the other ticket - this was deletion DURING an import. We can ignore the report here, and focus on on the other ticket discussion. TY!