mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1178] Bug: Doing a later, successful "Pull" on snapshots classified as "failed" doesn't change their status away from "failed" #2242
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2242
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @melyux on GitHub (Jul 13, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1178
Describe the bug
If one of the extractors for a snapshot fails to get its content, the snapshot is classified as "Failed". If you later go in and do a "Pull" on it with the button, it retries these failed extractors. If this operation succeeds, the snapshot's status does not get moved from "failed" to "succeeded".
The "status" in the filter seems to apply to the individual extractor results inside snapshots rather than the snapshots themselves, since this snapshot shows up under both "succeeded" and "failed", which is weird. Once everything works upon subsequent "Pull"s, the error count and the failed statuses should be removed, I think. Otherwise there's no point to these filters
Steps to reproduce
Screenshots or log output
ArchiveBox version
@melyux commented on GitHub (Jul 15, 2023):
Thought about this for a while, and snapshot filters should work like this: if the latest update was successful for an extractor for that snapshot, previous failures of that extractor should not count against the snapshot as "failed". Only when the latest update for at least one extractor failed should that snapshot be designated as "failed".
@pirate commented on GitHub (Aug 13, 2023):
Yeah I agree @melyux that's how they were intended to work already, there must be a bug in the filter logic.
@neel-suthar commented on GitHub (Jan 22, 2024):
I thought Snapshots had some kind of status field but seems like I am wrong. But is it worth it to add a status field for each snapshot? Internally the logic should stay the same but this can help us fetch snapshots very effectively. Just a thought.
@pirate commented on GitHub (Jan 23, 2024):
Nah @neel-suthar, I want Snapshots to stay basically immutable (i.e. no flag/status/etc fields) because we're moving to an event-driven model soon. But we can add a
@cached_propertythat gets the status using a query overArchiveResults and stores it in cache.@pirate commented on GitHub (Oct 27, 2024):
This longstanding bug should soon be fixed, each model is now a finite state machine with only a few valid states. Everything gets moved towards a final state deterministically on
tick()(like a game engine), and if a snapshot fails enough times it will eventually be marked "fatal", and will have to be retried as a new snapshot. This should make it much clearer when something is failing intermittently vs permanently.