[GH-ISSUE #1124] Bug: Tasks generally need to be able to handle snapshots being deleted by the UI mid-run #2214

Closed
opened 2026-03-01 17:57:20 +03:00 by kerem · 1 comment
Owner

Originally created by @pirate on GitHub (Mar 18, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1124

Many archivebox tasks (e.g. update, add, etc.) fail mid-run if Snaphsots are deleted from the UI while archivebox is iterating over them.

We should do a pass over the codebase and find all the for Snaphot.objects... loops and add try:/except: within them to handle the case where the snapshot dissapears because it was deleted by another process.

root@kiwi /o/archivebox.un# docker-compose run archivebox update --index-only
Creating archiveboxun_archivebox_run ... done
find: '/.config/chromium/Crash Reports/pending/': No such file or directory
[i] [2023-03-18 06:07:27] ArchiveBox v0.6.3: archivebox update --index-only
    > /data

find: '/.config/chromium/Crash Reports/pending/': No such file or directory
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 33, in <module>
    sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 797, in update
    write_link_details(link, out_dir=out_dir, skip_sql_index=True)
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/__init__.py", line 335, in write_link_details
    write_json_link_details(link, out_dir=out_dir)
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/json.py", line 99, in write_json_link_details
    atomic_write(str(path), link._asdict(extended=True))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 193, in _asdict
    'snapshot_id': self.snapshot_id,
                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/index/schema.py", line 265, in snapshot_id
    return str(Snapshot.objects.only('id').get(url=self.url).id)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get
    raise self.model.DoesNotExist(
core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist.
Originally created by @pirate on GitHub (Mar 18, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1124 Many archivebox tasks (e.g. `update`, `add`, etc.) fail mid-run if Snaphsots are deleted from the UI while archivebox is iterating over them. We should do a pass over the codebase and find all the `for Snaphot.objects...` loops and add `try:`/`except:` within them to handle the case where the snapshot dissapears because it was deleted by another process. ```python3 root@kiwi /o/archivebox.un# docker-compose run archivebox update --index-only Creating archiveboxun_archivebox_run ... done find: '/.config/chromium/Crash Reports/pending/': No such file or directory [i] [2023-03-18 06:07:27] ArchiveBox v0.6.3: archivebox update --index-only > /data find: '/.config/chromium/Crash Reports/pending/': No such file or directory Traceback (most recent call last): File "/usr/local/bin/archivebox", line 33, in <module> sys.exit(load_entry_point('archivebox', 'console_scripts', 'archivebox')()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/cli/archivebox_update.py", line 119, in main update( File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/main.py", line 797, in update write_link_details(link, out_dir=out_dir, skip_sql_index=True) File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/__init__.py", line 335, in write_link_details write_json_link_details(link, out_dir=out_dir) File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/json.py", line 99, in write_json_link_details atomic_write(str(path), link._asdict(extended=True)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 193, in _asdict 'snapshot_id': self.snapshot_id, ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/utils/functional.py", line 48, in __get__ res = instance.__dict__[self.name] = self.func(instance) ^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/index/schema.py", line 265, in snapshot_id return str(Snapshot.objects.only('id').get(url=self.url).id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 429, in get raise self.model.DoesNotExist( core.models.Snapshot.DoesNotExist: Snapshot matching query does not exist. ```
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

Closing as duplicate of https://github.com/ArchiveBox/ArchiveBox/issues/1309

<!-- gh-comment-id:1899642620 --> @pirate commented on GitHub (Jan 19, 2024): Closing as duplicate of https://github.com/ArchiveBox/ArchiveBox/issues/1309
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2214
No description provided.