[GH-ISSUE #412] Bugfix: django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp #1784

Closed
opened 2026-03-01 17:53:38 +03:00 by kerem · 15 comments
Owner

Originally created by @drpfenderson on GitHub (Jul 31, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/412

Originally assigned to: @cdvv7788 on GitHub.

Describe the bug

Y'all helped me with upgrading my super old archive to the django branch before official 0.4.9 release. I recently upgraded to the newest version, so I could start adding links. archivebox said I had to re-init. archivebox init gives me following error, and will not let me add new links.

django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

Full log/error below.

Steps to reproduce

  1. git checkout master to switch from django branch.
  2. git pull origin master to pull new release.
  3. pip install -e . (also tried with pip uninstall archivebox && pip install .)
  4. Navigate to archivebox-output directory.
  5. Run archivebox init.
  6. error.

Screenshots or log output

[i] [2020-07-31 17:34:44] ArchiveBox v0.4.9: archivebox init
    > /.archivebox-output/archive-working

[*] Updating existing ArchiveBox collection in this folder...
    /.archivebox-output/archive-working
------------------------------------------------------------------

[*] Verifying archive folder structure...
    √ /.archivebox-output/archive-working/sources
    √ /.archivebox-output/archive-working/archive
    √ /.archivebox-output/archive-working/logs
    √ /.archivebox-output/archive-working/ArchiveBox.conf

[*] Verifying main SQL index and running migrations...
    √ /.archivebox-output/archive-working/index.sqlite3

    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, core, sessions
    Running migrations:
    Applying core.0005_auto_20200728_0326... OK

[*] Collecting links from any existing indexes and archive folders...
    √ Loaded 1376 links from existing main index.
    √ Added 347 orphaned links from existing archive directories.
    ! Skipped adding 239 invalid link data directories.

    X /* SNIP A BUNCH OF BROKEN ARCHIVES /*

    Hint: For more information about the link data directories that were skipped, run:
        archivebox status
        archivebox list --status=invalid

[*] [2020-07-31 18:01:50] Writing 1723 links to main index...
Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create
    obj = self.select_for_update().get(**kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get
    self.model._meta.object_name
core.models.DoesNotExist: Snapshot matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/USERNAME/.local/bin/archivebox", line 33, in <module>
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/__init__.py", line 126, in main
    pwd=pwd or OUTPUT_DIR,
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/__init__.py", line 62, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/archivebox_init.py", line 35, in main
    out_dir=pwd or OUTPUT_DIR,
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/main.py", line 369, in init
    write_main_index(list(all_links.values()), out_dir=out_dir)
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/index/__init__.py", line 235, in write_main_index
    write_sql_main_index(links, out_dir=out_dir)
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBox/archivebox/index/sql.py", line 42, in write_sql_main_index
    Snapshot.objects.update_or_create(url=link.url, defaults=info)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 580, in update_or_create
    obj, created = self._create_object_from_params(kwargs, params, lock=True)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 604, in _create_object_from_params
    raise e
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 596, in _create_object_from_params
    obj = self.create(**params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 433, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 746, in save
  force_update=force_update, update_fields=update_fields)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 784, in save_base
    force_update, using, update_fields,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 887, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 926, in _do_insert
    using=using, raw=raw,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 1204, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql
    cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

Software versions

  • OS: Ubuntu 18.04
  • ArchiveBox version: 0.4.9 (0ac4e12)
  • Python version: Python 3.7.8
Originally created by @drpfenderson on GitHub (Jul 31, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/412 Originally assigned to: @cdvv7788 on GitHub. #### Describe the bug Y'all helped me with upgrading my super old archive to the django branch before official 0.4.9 release. I recently upgraded to the newest version, so I could start adding links. archivebox said I had to re-init. `archivebox init` gives me following error, and will not let me add new links. ``` django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp ``` Full log/error below. #### Steps to reproduce 1. `git checkout master` to switch from django branch. 2. `git pull origin master` to pull new release. 3. `pip install -e .` (also tried with `pip uninstall archivebox && pip install .`) 4. Navigate to archivebox-output directory. 5. Run `archivebox init`. 6. error. #### Screenshots or log output ``` [i] [2020-07-31 17:34:44] ArchiveBox v0.4.9: archivebox init > /.archivebox-output/archive-working [*] Updating existing ArchiveBox collection in this folder... /.archivebox-output/archive-working ------------------------------------------------------------------ [*] Verifying archive folder structure... √ /.archivebox-output/archive-working/sources √ /.archivebox-output/archive-working/archive √ /.archivebox-output/archive-working/logs √ /.archivebox-output/archive-working/ArchiveBox.conf [*] Verifying main SQL index and running migrations... √ /.archivebox-output/archive-working/index.sqlite3 Operations to perform: Apply all migrations: admin, auth, contenttypes, core, sessions Running migrations: Applying core.0005_auto_20200728_0326... OK [*] Collecting links from any existing indexes and archive folders... √ Loaded 1376 links from existing main index. √ Added 347 orphaned links from existing archive directories. ! Skipped adding 239 invalid link data directories. X /* SNIP A BUNCH OF BROKEN ARCHIVES /* Hint: For more information about the link data directories that were skipped, run: archivebox status archivebox list --status=invalid [*] [2020-07-31 18:01:50] Writing 1723 links to main index... Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create obj = self.select_for_update().get(**kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get self.model._meta.object_name core.models.DoesNotExist: Snapshot matching query does not exist. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/USERNAME/.local/bin/archivebox", line 33, in <module> sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')()) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/__init__.py", line 126, in main pwd=pwd or OUTPUT_DIR, File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/__init__.py", line 62, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/home/USERNAME/datahoard/ArchiveBox/archivebox/cli/archivebox_init.py", line 35, in main out_dir=pwd or OUTPUT_DIR, File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/main.py", line 369, in init write_main_index(list(all_links.values()), out_dir=out_dir) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/index/__init__.py", line 235, in write_main_index write_sql_main_index(links, out_dir=out_dir) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/util.py", line 109, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBox/archivebox/index/sql.py", line 42, in write_sql_main_index Snapshot.objects.update_or_create(url=link.url, defaults=info) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 580, in update_or_create obj, created = self._create_object_from_params(kwargs, params, lock=True) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 604, in _create_object_from_params raise e File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 596, in _create_object_from_params obj = self.create(**params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 433, in create obj.save(force_insert=True, using=self.db) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 746, in save force_update=force_update, update_fields=update_fields) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 784, in save_base force_update, using, update_fields, File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 887, in _save_table results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 926, in _do_insert using=using, raw=raw, File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 1204, in _insert return query.get_compiler(using=using).execute_sql(returning_fields) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp ``` #### Software versions - OS: Ubuntu 18.04 - ArchiveBox version: 0.4.9 (0ac4e12) - Python version: Python 3.7.8
Author
Owner

@karlicoss commented on GitHub (Aug 11, 2020):

Happens for me as well. Archivebox version: v0.4.13 (image from Docker hub).

I experimented a bit and managed to consistently reproduce. I suspect the urls that have a suffix in the timestamp are causing it.

  1. Create a new (empty) archive directory, put it in the compose file and initialise

    docker-compose run --rm archivebox init

  2. Archive few URLs

    input:

     https://beepb00p.xyz/mypy-error-handling.html
     https://beepb00p.xyz/promnesia.html
     https://beepb00p.xyz/hpi.html
    

    First archiving:

     docker-compose run --rm archivebox add <input
    

    Goes well:

     dco run --rm archivebox add <input
     [i] [2020-08-11 18:46:49] ArchiveBox v0.4.13: archivebox add < /dev/stdin
         > /data
    
     [+] [2020-08-11 18:46:49] Adding 3 links to index (crawl depth=0)...
         > Saved verbatim input to sources/1597171609-import.txt
         > Parsed 3 URLs from input (Plain Text)
         > Found 3 new URLs not already in index
    
     [*] [2020-08-11 18:46:49] Writing 3 links to main index...
         √ /data/index.sqlite3
         √ /data/index.json
         √ /data/index.html
    
     [▶] [2020-08-11 18:46:50] Collecting content for 3 Snapshots in archive...
    
     [+] [2020-08-11 18:46:50] "beepb00p.xyz/promnesia.html"
         https://beepb00p.xyz/promnesia.html
         > ./archive/1597171609
         ...
    
     [+] [2020-08-11 18:46:59] "beepb00p.xyz/mypy-error-handling.html"
         https://beepb00p.xyz/mypy-error-handling.html
         > ./archive/1597171609.0
         ...
    
     [+] [2020-08-11 18:47:10] "beepb00p.xyz/hpi.html"
         https://beepb00p.xyz/hpi.html
         > ./archive/1597171609.1
         ...
    
     [√] [2020-08-11 18:47:20] Update of 3 pages complete (30.33 sec)
         - 0 links skipped
         - 3 links updated
         - 0 links had errors
    
         Hint: To view your archive index, open:
             /data/index.html
         Or run the built-in webserver:
             archivebox server
    
     [*] [2020-08-11 18:47:20] Writing 3 links to main index...
         √ /data/index.sqlite3
         √ /data/index.json
         √ /data/index.html
    
  3. Now if you rerun the same command, it works well too

     dco run --rm archivebox add <input                              
     [i] [2020-08-11 18:51:29] ArchiveBox v0.4.13: archivebox add < /dev/stdin
         > /data
    
     [+] [2020-08-11 18:51:29] Adding 3 links to index (crawl depth=0)...
         > Saved verbatim input to sources/1597171889-import.txt
         > Parsed 3 URLs from input (Plain Text)
         > Found 0 new URLs not already in index
    
     [*] [2020-08-11 18:51:30] Writing 3 links to main index...
         √ /data/index.sqlite3
         √ /data/index.json
         √ /data/index.html
    

    As expected, just says everything is already in the index

  4. Now try running against on of the urls that has a dot in the timestamp (with a suffix)

     dco run --rm archivebox add https://beepb00p.xyz/hpi.html 
     # results in IntegrityError
     dco run --rm archivebox add https://beepb00p.xyz/mypy-error-handling.html
     # results in IntegrityError
    
  5. Interesting enough, running against https://beepb00p.xyz/promnesia.html, that has the timestamp 1597171609 works fine and as expected just says it's already in the index.

  6. Now if you try to add a completely different set of links, it works fine again:

     $ dco run --rm archivebox add <input2  
    
     [i] [2020-08-11 18:59:21] ArchiveBox v0.4.13: archivebox add < /dev/stdin
         > /data
    
     [+] [2020-08-11 18:59:21] Adding 2 links to index (crawl depth=0)...
         > Saved verbatim input to sources/1597172361-import.txt
         > Parsed 2 URLs from input (Plain Text)
         > Found 2 new URLs not already in index
    
     [*] [2020-08-11 18:59:21] Writing 5 links to main index...
         √ /data/index.sqlite3
         √ /data/index.json
         √ /data/index.html
    
     [▶] [2020-08-11 18:59:22] Collecting content for 2 Snapshots in archive...
    
     [+] [2020-08-11 18:59:22] "blog.sigfpe.com/2008/02/what-is-topology.html"
         http://blog.sigfpe.com/2008/02/what-is-topology.html
         > ./archive/1597172361
     ...
    
     [+] [2020-08-11 18:59:35] "blog.sigfpe.com/2006/11/yoneda-lemma.html"
         http://blog.sigfpe.com/2006/11/yoneda-lemma.html
         > ./archive/1597172361.0
     ...
    
  7. And again, if you try to add http://blog.sigfpe.com/2008/02/what-is-topology.html, it works, if you try http://blog.sigfpe.com/2006/11/yoneda-lemma.html it fails.

<!-- gh-comment-id:672200831 --> @karlicoss commented on GitHub (Aug 11, 2020): Happens for me as well. Archivebox version: `v0.4.13` (image from Docker hub). I experimented a bit and managed to consistently reproduce. I suspect the urls that have a suffix in the timestamp are causing it. 1. Create a new (empty) archive directory, put it in the compose file and initialise docker-compose run --rm archivebox init 2. Archive few URLs input: https://beepb00p.xyz/mypy-error-handling.html https://beepb00p.xyz/promnesia.html https://beepb00p.xyz/hpi.html First archiving: docker-compose run --rm archivebox add <input Goes well: dco run --rm archivebox add <input [i] [2020-08-11 18:46:49] ArchiveBox v0.4.13: archivebox add < /dev/stdin > /data [+] [2020-08-11 18:46:49] Adding 3 links to index (crawl depth=0)... > Saved verbatim input to sources/1597171609-import.txt > Parsed 3 URLs from input (Plain Text) > Found 3 new URLs not already in index [*] [2020-08-11 18:46:49] Writing 3 links to main index... √ /data/index.sqlite3 √ /data/index.json √ /data/index.html [▶] [2020-08-11 18:46:50] Collecting content for 3 Snapshots in archive... [+] [2020-08-11 18:46:50] "beepb00p.xyz/promnesia.html" https://beepb00p.xyz/promnesia.html > ./archive/1597171609 ... [+] [2020-08-11 18:46:59] "beepb00p.xyz/mypy-error-handling.html" https://beepb00p.xyz/mypy-error-handling.html > ./archive/1597171609.0 ... [+] [2020-08-11 18:47:10] "beepb00p.xyz/hpi.html" https://beepb00p.xyz/hpi.html > ./archive/1597171609.1 ... [√] [2020-08-11 18:47:20] Update of 3 pages complete (30.33 sec) - 0 links skipped - 3 links updated - 0 links had errors Hint: To view your archive index, open: /data/index.html Or run the built-in webserver: archivebox server [*] [2020-08-11 18:47:20] Writing 3 links to main index... √ /data/index.sqlite3 √ /data/index.json √ /data/index.html 3. Now if you rerun the same command, it works well too dco run --rm archivebox add <input [i] [2020-08-11 18:51:29] ArchiveBox v0.4.13: archivebox add < /dev/stdin > /data [+] [2020-08-11 18:51:29] Adding 3 links to index (crawl depth=0)... > Saved verbatim input to sources/1597171889-import.txt > Parsed 3 URLs from input (Plain Text) > Found 0 new URLs not already in index [*] [2020-08-11 18:51:30] Writing 3 links to main index... √ /data/index.sqlite3 √ /data/index.json √ /data/index.html As expected, just says everything is already in the index 4. Now try running against on of the urls that **has a dot in the timestamp** (with a suffix) dco run --rm archivebox add https://beepb00p.xyz/hpi.html # results in IntegrityError dco run --rm archivebox add https://beepb00p.xyz/mypy-error-handling.html # results in IntegrityError 5. Interesting enough, running against `https://beepb00p.xyz/promnesia.html`, that has the timestamp `1597171609` **works fine** and as expected just says it's already in the index. 6. Now if you try to add a completely different set of links, it works fine again: $ dco run --rm archivebox add <input2 [i] [2020-08-11 18:59:21] ArchiveBox v0.4.13: archivebox add < /dev/stdin > /data [+] [2020-08-11 18:59:21] Adding 2 links to index (crawl depth=0)... > Saved verbatim input to sources/1597172361-import.txt > Parsed 2 URLs from input (Plain Text) > Found 2 new URLs not already in index [*] [2020-08-11 18:59:21] Writing 5 links to main index... √ /data/index.sqlite3 √ /data/index.json √ /data/index.html [▶] [2020-08-11 18:59:22] Collecting content for 2 Snapshots in archive... [+] [2020-08-11 18:59:22] "blog.sigfpe.com/2008/02/what-is-topology.html" http://blog.sigfpe.com/2008/02/what-is-topology.html > ./archive/1597172361 ... [+] [2020-08-11 18:59:35] "blog.sigfpe.com/2006/11/yoneda-lemma.html" http://blog.sigfpe.com/2006/11/yoneda-lemma.html > ./archive/1597172361.0 ... 7. And again, if you try to add `http://blog.sigfpe.com/2008/02/what-is-topology.html`, it works, if you try `http://blog.sigfpe.com/2006/11/yoneda-lemma.html` it fails.
Author
Owner

@pirate commented on GitHub (Aug 11, 2020):

Very helpful @karlicoss! This is high on our priority list of things to fix.

I'll check in with an update once we've started working on this. I suspect it's a relatively simple bug in the timestamp deduping code, most of the work will be QA and testing to make sure we don't introduce any regressions while we fix it.

For context, timestamp deduping has been one of the most brittle parts of ArchiveBox in the past years, and we already have plans to remove the need for it in a refactoring in the next major version.

<!-- gh-comment-id:672250104 --> @pirate commented on GitHub (Aug 11, 2020): Very helpful @karlicoss! This is high on our priority list of things to fix. I'll check in with an update once we've started working on this. I suspect it's a relatively simple bug in the timestamp deduping code, most of the work will be QA and testing to make sure we don't introduce any regressions while we fix it. For context, timestamp deduping has been one of the most brittle parts of ArchiveBox in the past years, and we already have plans to remove the need for it in a refactoring in the next major version.
Author
Owner

@jrruethe commented on GitHub (Aug 15, 2020):

I unfortunately ran into this issue as well. From my testing, I agree with @karlicoss and his assessment that it is related to the timestamp suffixes. I am trying to pin it down further than that, I'll reply if I figure anything out.

Thanks

<!-- gh-comment-id:674327047 --> @jrruethe commented on GitHub (Aug 15, 2020): I unfortunately ran into this issue as well. From my testing, I agree with @karlicoss and his assessment that it is related to the timestamp suffixes. I am trying to pin it down further than that, I'll reply if I figure anything out. Thanks
Author
Owner

@coisnepe commented on GitHub (Aug 16, 2020):

it works fine again

Nothing works for me anymore, sadly... Attempting to add any link, whether completely new or already archived, results in django.db.utils.IntegrityError.
What's the least dangerous way to fix it (temporarily disabling the unique constraint, deleting one/some archives etc...)?

<!-- gh-comment-id:674545139 --> @coisnepe commented on GitHub (Aug 16, 2020): > it works fine again Nothing works for me anymore, sadly... Attempting to add any link, whether completely new or already archived, results in `django.db.utils.IntegrityError`. What's the least dangerous way to fix it (temporarily disabling the unique constraint, deleting one/some archives etc...)?
Author
Owner

@apkallum commented on GitHub (Aug 17, 2020):

Hello @coisnepe @jrruethe @karlicoss @drpfenderson & everyone else, would you mind testing my master branch with a fix here? https://github.com/apkallum/ArchiveBox

<!-- gh-comment-id:675030321 --> @apkallum commented on GitHub (Aug 17, 2020): Hello @coisnepe @jrruethe @karlicoss @drpfenderson & everyone else, would you mind testing my `master` branch with a fix here? https://github.com/apkallum/ArchiveBox
Author
Owner

@drpfenderson commented on GitHub (Aug 17, 2020):

@apkallum - Using your build, it gets a bit further. Modifies a few entries, and then gives following error:

Traceback (most recent call last):
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/sql.py", line 43, in write_sql_main_index
    snapshot = Snapshot.objects.get(url=link.url)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/query.py", line 415, in get
    raise self.model.DoesNotExist(
core.models.DoesNotExist: Snapshot matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/USERNAME/.local/bin/archivebox", line 11, in <module>
    load_entry_point('archivebox', 'console_scripts', 'archivebox')()
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/__init__.py", line 122, in main
    run_subcommand(
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/__init__.py", line 62, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/archivebox_init.py", line 33, in main
    init(
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/main.py", line 376, in init
    write_main_index(list(all_links.values()), out_dir=out_dir)
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/__init__.py", line 235, in write_main_index
    write_sql_main_index(links, out_dir=out_dir)
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/sql.py", line 51, in write_sql_main_index
    snapshot.save()
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 745, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 782, in save_base
    updated = self._save_table(
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 887, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 924, in _do_insert
    return manager._insert(
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/query.py", line 1204, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql
    cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

EDIT: To be clear, this is using archivebox init in the main archive directory.

EDIT 2: Oops. Realized I had switched to Python 3.8 for another project and forgot to update-alternatives. Running archivebox init with Python 3.7, with apkallum's branch, gives me essentially same error.

<!-- gh-comment-id:675083585 --> @drpfenderson commented on GitHub (Aug 17, 2020): @apkallum - Using your build, it gets a bit further. Modifies a few entries, and then gives following error: ``` Traceback (most recent call last): File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/sql.py", line 43, in write_sql_main_index snapshot = Snapshot.objects.get(url=link.url) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/query.py", line 415, in get raise self.model.DoesNotExist( core.models.DoesNotExist: Snapshot matching query does not exist. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/USERNAME/.local/bin/archivebox", line 11, in <module> load_entry_point('archivebox', 'console_scripts', 'archivebox')() File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/__init__.py", line 122, in main run_subcommand( File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/__init__.py", line 62, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/cli/archivebox_init.py", line 33, in main init( File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/main.py", line 376, in init write_main_index(list(all_links.values()), out_dir=out_dir) File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/__init__.py", line 235, in write_main_index write_sql_main_index(links, out_dir=out_dir) File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/datahoard/ArchiveBoxTimefix/archivebox/index/sql.py", line 51, in write_sql_main_index snapshot.save() File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 745, in save self.save_base(using=using, force_insert=force_insert, File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 782, in save_base updated = self._save_table( File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 887, in _save_table results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/base.py", line 924, in _do_insert return manager._insert( File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/query.py", line 1204, in _insert return query.get_compiler(using=using).execute_sql(returning_fields) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp ``` EDIT: To be clear, this is using `archivebox init` in the main archive directory. EDIT 2: Oops. Realized I had switched to Python 3.8 for another project and forgot to update-alternatives. Running `archivebox init` with Python 3.7, with apkallum's branch, gives me essentially same error.
Author
Owner

@pirate commented on GitHub (Aug 18, 2020):

Give the latest master a try:

pip install --upgrade archivebox
# or if you use docker
docker pull nikisweeting/archivebox
<!-- gh-comment-id:675494852 --> @pirate commented on GitHub (Aug 18, 2020): Give the latest `master` a try: ```bash pip install --upgrade archivebox # or if you use docker docker pull nikisweeting/archivebox ```
Author
Owner

@drpfenderson commented on GitHub (Aug 18, 2020):

Used pip install --upgrade archivebox, it upgraded and installed 2 additional packages.

Successfully installed archivebox-0.4.17 croniter-0.3.34 natsort-7.0.1

Went to archive directory to run archivebox init.

[*] [2020-08-18 16:00:00] Writing 1723 links to main index...
Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create
    obj = self.select_for_update().get(**kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get
    self.model._meta.object_name
core.models.DoesNotExist: Snapshot matching query does not exist.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp


Traceback (most recent call last):
  File "/home/USERNAME/.local/bin/archivebox", line 8, in <module>
    sys.exit(main())
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 126, in main
    pwd=pwd or OUTPUT_DIR,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 62, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/archivebox_init.py", line 35, in main
    out_dir=pwd or OUTPUT_DIR,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/main.py", line 377, in init
    write_main_index(list(all_links.values()), out_dir=out_dir)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/__init__.py", line 246, in write_main_index
    write_sql_main_index(links, out_dir=out_dir)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/sql.py", line 46, in write_sql_main_index
    Snapshot.objects.update_or_create(url=link.url, defaults=info)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 580, in update_or_create
    obj, created = self._create_object_from_params(kwargs, params, lock=True)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 604, in _create_object_from_params
  
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 433, in create
    obj.save(force_insert=True, using=self.db)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 746, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 784, in save_base
    force_update, using, update_fields,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 887, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 926, in _do_insert
    using=using, raw=raw,
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 1204, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql
    cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp

Note: I'm not sure if you need the entire traceback each time, since most of it is identical, but figured more is better when hunting down bugs. Apologies if it's too much.

<!-- gh-comment-id:675570699 --> @drpfenderson commented on GitHub (Aug 18, 2020): Used `pip install --upgrade archivebox`, it upgraded and installed 2 additional packages. ``` Successfully installed archivebox-0.4.17 croniter-0.3.34 natsort-7.0.1 ``` Went to archive directory to run `archivebox init`. ``` [*] [2020-08-18 16:00:00] Writing 1723 links to main index... Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create obj = self.select_for_update().get(**kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get self.model._meta.object_name core.models.DoesNotExist: Snapshot matching query does not exist. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp Traceback (most recent call last): File "/home/USERNAME/.local/bin/archivebox", line 8, in <module> sys.exit(main()) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 126, in main pwd=pwd or OUTPUT_DIR, File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 62, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/archivebox_init.py", line 35, in main out_dir=pwd or OUTPUT_DIR, File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/main.py", line 377, in init write_main_index(list(all_links.values()), out_dir=out_dir) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/__init__.py", line 246, in write_main_index write_sql_main_index(links, out_dir=out_dir) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 111, in typechecked_function return func(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/sql.py", line 46, in write_sql_main_index Snapshot.objects.update_or_create(url=link.url, defaults=info) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 580, in update_or_create obj, created = self._create_object_from_params(kwargs, params, lock=True) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 604, in _create_object_from_params File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 433, in create obj.save(force_insert=True, using=self.db) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 746, in save force_update=force_update, update_fields=update_fields) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 784, in save_base force_update, using, update_fields, File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 887, in _save_table results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/base.py", line 926, in _do_insert using=using, raw=raw, File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 1204, in _insert return query.get_compiler(using=using).execute_sql(returning_fields) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1392, in execute_sql cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) django.db.utils.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp ``` Note: I'm not sure if you need the entire traceback each time, since most of it is identical, but figured more is better when hunting down bugs. Apologies if it's too much.
Author
Owner

@coisnepe commented on GitHub (Aug 19, 2020):

Deployed the latest Docker image and it seems to have fixed the issue. Thanks so much!

<!-- gh-comment-id:675917969 --> @coisnepe commented on GitHub (Aug 19, 2020): Deployed the latest Docker image and it seems to have fixed the issue. Thanks so much!
Author
Owner

@pirate commented on GitHub (Aug 19, 2020):

@drpfenderson let me know if you're still having any issues and we can reopen the ticket.

<!-- gh-comment-id:676510630 --> @pirate commented on GitHub (Aug 19, 2020): @drpfenderson let me know if you're still having any issues and we can reopen the ticket.
Author
Owner

@drpfenderson commented on GitHub (Aug 19, 2020):

@pirate Updated to newest.

[i] [2020-08-19 15:55:47] ArchiveBox v0.4.21: archivebox init

same error, exactly, as my last log.

[*] [2020-08-19 16:20:40] Writing 1723 links to main index...
Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create
    obj = self.select_for_update().get(**kwargs)
  File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get
    self.model._meta.object_name
core.models.DoesNotExist: Snapshot matching query does not exist.

The rest of the log is exactly the same as well, line references and all.

EDIT: I thought maybe I could try nuking it, starting from scratch. No dice, same error. I tried with docker and docker-compose as well, after removing the original package from pip. Same error in both, but with python3.8 instead.

<!-- gh-comment-id:676530245 --> @drpfenderson commented on GitHub (Aug 19, 2020): @pirate Updated to newest. ``` [i] [2020-08-19 15:55:47] ArchiveBox v0.4.21: archivebox init ``` same error, exactly, as [my last log](https://github.com/pirate/ArchiveBox/issues/412#issuecomment-675570699). ``` [*] [2020-08-19 16:20:40] Writing 1723 links to main index... Traceback (most recent call last): File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 575, in update_or_create obj = self.select_for_update().get(**kwargs) File "/home/USERNAME/.local/lib/python3.7/site-packages/django/db/models/query.py", line 417, in get self.model._meta.object_name core.models.DoesNotExist: Snapshot matching query does not exist. ``` The rest of the log is exactly the same as well, line references and all. EDIT: I thought maybe I could try nuking it, starting from scratch. No dice, same error. I tried with docker and docker-compose as well, after removing the original package from pip. Same error in both, but with python3.8 instead.
Author
Owner

@jrruethe commented on GitHub (Aug 23, 2020):

For what it is worth, v0.4.21 fixed the issue I was having regarding sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp. Thank you!

<!-- gh-comment-id:678827048 --> @jrruethe commented on GitHub (Aug 23, 2020): For what it is worth, v0.4.21 fixed the issue I was having regarding `sqlite3.IntegrityError: UNIQUE constraint failed: core_snapshot.timestamp`. Thank you!
Author
Owner

@drpfenderson commented on GitHub (Sep 2, 2020):

With the changes present in the cdvv7788:sql_index branch, reflected in PR #452, it fixed my issue! I was able to archivebox init on the old index, updated with some broken directories, but ultimately wrote everything to the index. Looks to be intact! I'll just add the "invalid link data directories" through a .txt file.

<!-- gh-comment-id:686044686 --> @drpfenderson commented on GitHub (Sep 2, 2020): With the changes present in the `cdvv7788:sql_index` branch, reflected in PR #452, it fixed my issue! I was able to `archivebox init` on the old index, updated with some broken directories, but ultimately wrote everything to the index. Looks to be intact! I'll just add the "invalid link data directories" through a .txt file.
Author
Owner

@cdvv7788 commented on GitHub (Sep 2, 2020):

@pirate I added a final check to avoid duplication in the PR when migrating the index. Check it when reviewing the PR #452

<!-- gh-comment-id:686046396 --> @cdvv7788 commented on GitHub (Sep 2, 2020): @pirate I added a final check to avoid duplication in the PR when migrating the index. Check it when reviewing the PR #452
Author
Owner

@pirate commented on GitHub (Apr 12, 2022):

Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting

Contributions/suggestions welcome there.

<!-- gh-comment-id:1097266390 --> @pirate commented on GitHub (Apr 12, 2022): Note I've added a new DB/filesystem troubleshooting area to the wiki that may help people arriving here from Google: https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading-or-Merging-Archives#database-troubleshooting Contributions/suggestions welcome there.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1784
No description provided.