mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1626] Bug: ArchiveBox keeps readding a certain tag to items #975
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#975
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @FeverGyorn on GitHub (Dec 25, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1626
Originally assigned to: @pirate on GitHub.
Provide a screenshot and describe the bug
Hello together,
so, I'm encountering a bit of a weird situation with my ArchiveBox installation I couldn't really find something about.
I have around 600 snapshots in my instance at the moment. Every night it keeps re-adding a certain tag (to-be-reviewed) to around 330ish elements by itself. I can manually delete the tag then, but it will be there again the next day.
This is NOT present for new items I'm adding currently!
Docker Compose is mostly standard besides using an NFS mounted archive volume.
Steps to reproduce
Logs or errors
ArchiveBox Version
How did you install the version of ArchiveBox you are using?
Docker (or other container system like podman/LXC/Kubernetes or TrueNAS/Cloudron/YunoHost/etc.)
What operating system are you running on?
Linux (Ubuntu/Debian/Arch/Alpine/etc.)
What type of drive are you using to store your ArchiveBox data?
data/is on a local SSD or NVMe drivedata/is on a spinning hard drive or external USB drivedata/is on a network mount (e.g. NFS/SMB/CIFS/etc.)data/is on a FUSE mount (e.g. SSHFS/RClone/S3/B2/OneDrive, etc.)Docker Compose Configuration
ArchiveBox Configuration
@pirate commented on GitHub (Dec 26, 2024):
Can you share the output of:
docker compose run archivebox_scheduler --show@FeverGyorn commented on GitHub (Dec 26, 2024):
Not sure the argument is really available here.
@pirate commented on GitHub (Dec 26, 2024):
Sorry typo'd a word, I meant:
@FeverGyorn commented on GitHub (Dec 26, 2024):
Aye, that's the output here:
Remark: I'll be coming back with cleaning up the orphans and init again later (because I assume, that would be the recommendation now?).
@FeverGyorn commented on GitHub (Dec 26, 2024):
To reiterate on my previous comment, as I think there was a misunderstanding on my end.
As I moved the archive collection from a local storage to an NFS share, I did run archivebox init before but INSIDE the docker container.
Wondered why I got the message on missing migrations above.
I rerun this now on the host and the message on missing migrations is gone. At a first glance a few of the tags I've applied in the meantime seems to be missing but working with a somewhat corrupt install is certainly my bad then.
I assume, best wait for the next night now?
@pirate commented on GitHub (Dec 26, 2024):
You can cancel the scan it's not important, it seems fine now. Looks like you ran it with some of the data dir missing briefly, then when it was run again with the full data dir it caught up.
@FeverGyorn commented on GitHub (Dec 27, 2024):
Small update: Tags are still changed unexpectedly. This is still related to re-adding tags I already removed (going so far to re-add a tag that doesn't even exist at all anymore). Tags that are added to items are NOT removed.
I've reviewed the logs and the compose definition and doesn't the following config part refetch all links in the index?
I can see this in the schedule.log.
I commented out the command line for the next night. This does ofc not really explain why tags are being readded. Messages about missing migrations or something like that are not shown anymore.
@pirate commented on GitHub (Jan 3, 2025):
The scheduled daily update only fetches URLs that are missing / that have failed previously, it shouldn't touch any URLs that have already succesfully downloaded.
Even then, when it retries failed/incomplete snapshots, it should never re-add tags to them, I'm not sure why it's doing that for you.
The missing migrations message is very suspicious, whatever container was producing that was likely the cause of the issue. A worker running out of sync with the schema in your DB could cause problems like this in theory, though it still doesn't fully explain why deleted tags would get re-added.
I'm investigating this still, but I may prioritize the more complete fix that's coming in v0.8.x: I'm migrating to a new tagging system entirely.
PUID=33in docker #2116PUID=33in docker #3626