[GH-ISSUE #1105] Bug: Mercury parser failing with foreign key constraint error #3713

Closed
opened 2026-03-15 00:06:19 +03:00 by kerem · 1 comment
Owner

Originally created by @pirate on GitHub (Feb 21, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1105

root@4e31b84ee81b:/data# archivebox --version
ArchiveBox v0.6.2
Cpython Linux Linux-4.4.180+-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
 
[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                    
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 
 
[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              
 
[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              
 
[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     /data                                                                       
 √  SOURCES_DIR           4 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           3 files         valid     ./archive                                                                   
 √  CONFIG_FILE           104.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             232.0 KB        valid     ./index.sqlite3      

https://github.com/tjhorner/archivebox-exporter/issues/20

> /usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2023-02-18__20:34:32 version=0.6.2 docker=True is_tty=False
Internal Server Error: /admin/core/snapshot/
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: FOREIGN KEY constraint failed
 
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/app/archivebox/extractors/__init__.py", line 116, in archive_link
    ArchiveResult.objects.create(snapshot=snapshot, extractor=method_name, cmd=result.cmd, cmd_version=result.cmd_version,
  File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 447, in create
    obj.save(force_insert=True, using=self.db)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 753, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 790, in save_base
    updated = self._save_table(
  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 895, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 933, in _do_insert
    return manager._insert(
  File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1254, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: FOREIGN KEY constraint failed
 
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 614, in wrapper
    return self.admin_site.admin_view(view)(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/sites.py", line 233, in inner
    return view(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 1719, in changelist_view
    response = self.response_action(request, queryset=cl.get_queryset(request))
  File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 1402, in response_action
    response = func(self, request, queryset)
  File "/app/archivebox/core/admin.py", line 260, in resnapshot_snapshot
    add(new_url, tag=snapshot.tags_str())
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/main.py", line 624, in add
    archive_links(new_links, overwrite=False, **archive_kwargs)
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 181, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/__init__.py", line 130, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_mercury(Link(url=https://wykop.pl/#2023-02-19T01:00:24+00:00))
Originally created by @pirate on GitHub (Feb 21, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1105 ```bash root@4e31b84ee81b:/data# archivebox --version ArchiveBox v0.6.2 Cpython Linux Linux-4.4.180+-x86_64-with-glibc2.28 x86_64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid /data √ SOURCES_DIR 4 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 3 files valid ./archive √ CONFIG_FILE 104.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 232.0 KB valid ./index.sqlite3 ``` https://github.com/tjhorner/archivebox-exporter/issues/20 ```python3 > /usr/local/bin/archivebox server --quick-init 0.0.0.0:8000; ts=2023-02-18__20:34:32 version=0.6.2 docker=True is_tty=False Internal Server Error: /admin/core/snapshot/ Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: FOREIGN KEY constraint failed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/app/archivebox/extractors/__init__.py", line 116, in archive_link ArchiveResult.objects.create(snapshot=snapshot, extractor=method_name, cmd=result.cmd, cmd_version=result.cmd_version, File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 447, in create obj.save(force_insert=True, using=self.db) File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 753, in save self.save_base(using=using, force_insert=force_insert, File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 790, in save_base updated = self._save_table( File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 895, in _save_table results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw) File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 933, in _do_insert return manager._insert( File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1254, in _insert return query.get_compiler(using=using).execute_sql(returning_fields) File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers return executor(sql, params, many, context) File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 413, in execute return Database.Cursor.execute(self, query, params) django.db.utils.IntegrityError: FOREIGN KEY constraint failed The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 614, in wrapper return self.admin_site.admin_view(view)(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/sites.py", line 233, in inner return view(request, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 43, in _wrapper return bound_method(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/utils/decorators.py", line 130, in _wrapped_view response = view_func(request, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 1719, in changelist_view response = self.response_action(request, queryset=cl.get_queryset(request)) File "/usr/local/lib/python3.9/site-packages/django/contrib/admin/options.py", line 1402, in response_action response = func(self, request, queryset) File "/app/archivebox/core/admin.py", line 260, in resnapshot_snapshot add(new_url, tag=snapshot.tags_str()) File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/main.py", line 624, in add archive_links(new_links, overwrite=False, **archive_kwargs) File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 181, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/__init__.py", line 130, in archive_link raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format( Exception: Exception in archive_methods.save_mercury(Link(url=https://wykop.pl/#2023-02-19T01:00:24+00:00)) ```
Author
Owner

@pirate commented on GitHub (Jan 8, 2026):

Should be fixed on dev, mercury parser is deprecated anyway now because postlight team stopped maintaining it unfortunately.

<!-- gh-comment-id:3722304784 --> @pirate commented on GitHub (Jan 8, 2026): Should be fixed on `dev`, mercury parser is deprecated anyway now because postlight team stopped maintaining it unfortunately.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3713
No description provided.