mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #412] Bugfix: django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp #274
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#274
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @drpfenderson on GitHub (Jul 31, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/412
Originally assigned to: @cdvv7788 on GitHub.
Describe the bug
Y'all helped me with upgrading my super old archive to the django branch before official 0.4.9 release. I recently upgraded to the newest version, so I could start adding links. archivebox said I had to re-init.
archivebox initgives me following error, and will not let me add new links.Full log/error below.
Steps to reproduce
git checkout masterto switch from django branch.git pull origin masterto pull new release.pip install -e .(also tried withpip uninstall archivebox && pip install .)archivebox init.Screenshots or log output
Software versions
0ac4e12)@karlicoss commented on GitHub (Aug 11, 2020):
Happens for me as well. Archivebox version:
v0.4.13(image from Docker hub).I experimented a bit and managed to consistently reproduce. I suspect the urls that have a suffix in the timestamp are causing it.
Create a new (empty) archive directory, put it in the compose file and initialise
docker-compose run --rm archivebox init
Archive few URLs
input:
First archiving:
Goes well:
Now if you rerun the same command, it works well too
As expected, just says everything is already in the index
Now try running against on of the urls that has a dot in the timestamp (with a suffix)
Interesting enough, running against
https://beepb00p.xyz/promnesia.html, that has the timestamp1597171609works fine and as expected just says it's already in the index.Now if you try to add a completely different set of links, it works fine again:
And again, if you try to add
http://blog.sigfpe.com/2008/02/what-is-topology.html, it works, if you tryhttp://blog.sigfpe.com/2006/11/yoneda-lemma.htmlit fails.@pirate commented on GitHub (Aug 11, 2020):
Very helpful @karlicoss! This is high on our priority list of things to fix.
I'll check in with an update once we've started working on this. I suspect it's a relatively simple bug in the timestamp deduping code, most of the work will be QA and testing to make sure we don't introduce any regressions while we fix it.
For context, timestamp deduping has been one of the most brittle parts of ArchiveBox in the past years, and we already have plans to remove the need for it in a refactoring in the next major version.
@jrruethe commented on GitHub (Aug 15, 2020):
I unfortunately ran into this issue as well. From my testing, I agree with @karlicoss and his assessment that it is related to the timestamp suffixes. I am trying to pin it down further than that, I'll reply if I figure anything out.
Thanks
@coisnepe commented on GitHub (Aug 16, 2020):
Nothing works for me anymore, sadly... Attempting to add any link, whether completely new or already archived, results in
django.db.utils.IntegrityError.What's the least dangerous way to fix it (temporarily disabling the unique constraint, deleting one/some archives etc...)?
@apkallum commented on GitHub (Aug 17, 2020):
Hello @coisnepe @jrruethe @karlicoss @drpfenderson & everyone else, would you mind testing my
masterbranch with a fix here? https://github.com/apkallum/ArchiveBox@drpfenderson commented on GitHub (Aug 17, 2020):
@apkallum - Using your build, it gets a bit further. Modifies a few entries, and then gives following error:
EDIT: To be clear, this is using
archivebox initin the main archive directory.EDIT 2: Oops. Realized I had switched to Python 3.8 for another project and forgot to update-alternatives. Running
archivebox initwith Python 3.7, with apkallum's branch, gives me essentially same error.@pirate commented on GitHub (Aug 18, 2020):
Give the latest
mastera try:@drpfenderson commented on GitHub (Aug 18, 2020):
Used
pip install --upgrade archivebox, it upgraded and installed 2 additional packages.Went to archive directory to run
archivebox init.Note: I'm not sure if you need the entire traceback each time, since most of it is identical, but figured more is better when hunting down bugs. Apologies if it's too much.
@coisnepe commented on GitHub (Aug 19, 2020):
Deployed the latest Docker image and it seems to have fixed the issue. Thanks so much!
@pirate commented on GitHub (Aug 19, 2020):
@drpfenderson let me know if you're still having any issues and we can reopen the ticket.
@drpfenderson commented on GitHub (Aug 19, 2020):
@pirate Updated to newest.
same error, exactly, as my last log.
The rest of the log is exactly the same as well, line references and all.
EDIT: I thought maybe I could try nuking it, starting from scratch. No dice, same error. I tried with docker and docker-compose as well, after removing the original package from pip. Same error in both, but with python3.8 instead.
@jrruethe commented on GitHub (Aug 23, 2020):
For what it is worth, v0.4.21 fixed the issue I was having regarding
sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp. Thank you!@drpfenderson commented on GitHub (Sep 2, 2020):
With the changes present in the
cdvv7788:sql_indexbranch, reflected in PR #452, it fixed my issue! I was able toarchivebox initon the old index, updated with some broken directories, but ultimately wrote everything to the index. Looks to be intact! I'll just add the "invalid link data directories" through a .txt file.@cdvv7788 commented on GitHub (Sep 2, 2020):
@pirate I added a final check to avoid duplication in the PR when migrating the index. Check it when reviewing the PR #452
@pirate commented on GitHub (Apr 12, 2022):
Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting
Contributions/suggestions welcome there.