[GH-ISSUE #617] Question: archivebox server throws 500 error #382

Closed
opened 2026-03-01 14:43:05 +03:00 by kerem · 11 comments
Owner

Originally created by @winteriscariot on GitHub (Jan 15, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/617

I'm using nginx in front of archivebox on my local intranet to access my archives, and it's throwing a 500 error whenever I simply load the front page. nginx is recording no errors, so I assume this is happening internal to archivebox. The archivebox server console -- running in a screen -- simply reports the 500 error and nothing further.

Is there anything I can do to troubleshoot this? Perhaps increase the verbosity of archivebox server at the console so I can see where the fault exists?

tried an archivebox init on my archive directory to no result

I can dump the archivebox list to html, which is what I'm doing for now, but it's not ideal since I can change the order of links (I prefer newest links at the top of the list, while the html dump puts oldest at top; it's a pet peeve but the archivebox server allows me to change that).

Arch Linux, archivebox 0.5.3 via pip

Thanks!

Originally created by @winteriscariot on GitHub (Jan 15, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/617 I'm using nginx in front of archivebox on my local intranet to access my archives, and it's throwing a 500 error whenever I simply load the front page. nginx is recording no errors, so I assume this is happening internal to archivebox. The `archivebox server` console -- running in a screen -- simply reports the 500 error and nothing further. Is there anything I can do to troubleshoot this? Perhaps increase the verbosity of `archivebox server` at the console so I can see where the fault exists? tried an `archivebox init` on my archive directory to no result I can dump the archivebox list to html, which is what I'm doing for now, but it's not ideal since I can change the order of links (I prefer newest links at the top of the list, while the html dump puts oldest at top; it's a pet peeve but the archivebox server allows me to change that). Arch Linux, archivebox 0.5.3 via pip Thanks!
kerem 2026-03-01 14:43:05 +03:00
Author
Owner

@winteriscariot commented on GitHub (Jan 15, 2021):

Here is the result of running archivebox server --debug and then loading reloading the main archivebox page (at which point I receive the 500 error):

$ archivebox server --debug
[i] [2021-01-15 16:31:07] ArchiveBox v0.5.3: archivebox server --debug
    > /mnt/storage/Archive

[+] Starting ArchiveBox webserver...
    Hint: The admin username is <...>

Performing system checks...

System check identified no issues (0 silenced).
January 15, 2021 - 16:31:07
Django version 3.1.3, using settings 'core.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
[15/Jan/2021 16:31:12] "GET /admin/core/snapshot/ HTTP/1.0" 500 145
[15/Jan/2021 16:31:17] "GET / HTTP/1.0" 302 0
[15/Jan/2021 16:31:17] "GET /admin/core/snapshot/ HTTP/1.0" 500 145

Note: the archivebox server was working fine, until yesterday it just started throwing the 500 error. I'm pretty much at a loss, as nothing has changed on the server since then. (literally went to bed one night and woke up to it not working).

Not saying nothing has changed (there's probably something) I'm just kind of clueless as to where to start looking.

EDIT: Did some more testing. I created a new archive directory and copied my .archivebox/archive directory into it, ran archivebox init and then archivebox update, created a superuser with archivebox manage createsuperuser, confirmed no errors, then ran archivebox server in the new directory. It continues to throw a 500 error. I will likely try creating a fresh archivebox directory (without the existing links) to see if the server will run properly without any customizing of the config or anything.

<!-- gh-comment-id:761044920 --> @winteriscariot commented on GitHub (Jan 15, 2021): Here is the result of running `archivebox server --debug` and then loading reloading the main archivebox page (at which point I receive the 500 error): ``` $ archivebox server --debug [i] [2021-01-15 16:31:07] ArchiveBox v0.5.3: archivebox server --debug > /mnt/storage/Archive [+] Starting ArchiveBox webserver... Hint: The admin username is <...> Performing system checks... System check identified no issues (0 silenced). January 15, 2021 - 16:31:07 Django version 3.1.3, using settings 'core.settings' Starting development server at http://127.0.0.1:8000/ Quit the server with CONTROL-C. [15/Jan/2021 16:31:12] "GET /admin/core/snapshot/ HTTP/1.0" 500 145 [15/Jan/2021 16:31:17] "GET / HTTP/1.0" 302 0 [15/Jan/2021 16:31:17] "GET /admin/core/snapshot/ HTTP/1.0" 500 145 ``` Note: the archivebox server _was_ working fine, until yesterday it just started throwing the 500 error. I'm pretty much at a loss, as nothing has changed on the server since then. (literally went to bed one night and woke up to it not working). Not saying nothing _has_ changed (there's probably something) I'm just kind of clueless as to where to start looking. EDIT: Did some more testing. I created a new archive directory and copied my .archivebox/archive directory into it, ran `archivebox init` and then `archivebox update`, created a superuser with `archivebox manage createsuperuser`, confirmed no errors, then ran `archivebox server` in the new directory. It continues to throw a 500 error. I will likely try creating a fresh archivebox directory (without the existing links) to see if the server will run properly without any customizing of the config or anything.
Author
Owner

@winteriscariot commented on GitHub (Jan 15, 2021):

After creating a new archive dir without the existing archives, I was able to avoid the 500 error. Therefore I'm marking this as closed, since it must be something to do with one of my links.

<!-- gh-comment-id:761055207 --> @winteriscariot commented on GitHub (Jan 15, 2021): After creating a new archive dir without the existing archives, I was able to avoid the 500 error. Therefore I'm marking this as closed, since it must be something to do with one of my links.
Author
Owner

@mAAdhaTTah commented on GitHub (Jan 15, 2021):

If you set DEBUG=True in your config, you'll get a Django debug stack trace, which can help with stuff like this.

<!-- gh-comment-id:761088363 --> @mAAdhaTTah commented on GitHub (Jan 15, 2021): If you set DEBUG=True in your config, you'll get a Django debug stack trace, which can help with stuff like this.
Author
Owner

@winteriscariot commented on GitHub (Jan 15, 2021):

Hm the issue has started to recur. thanks for that @mAAdhaTTah -- i was able to get more info when loading /admin/core/snapshot/. Django traceback output:

Environment:


Request Method: GET
Request URL: http://archive.local/admin/core/snapshot/

Django Version: 3.1.3
Python Version: 3.9.1
Installed Applications:
['django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'django.contrib.admin',
 'core',
 'django_extensions']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware']


Template error:
In template /home/winter/.local/lib/python3.9/site-packages/archivebox/themes/admin/base.html, error at line 0
   21005575-02D4-D4B5-4572-D2005CAF9866
   1 : {% load i18n static %}<!DOCTYPE html>
   2 : {% get_current_language as LANGUAGE_CODE %}{% get_current_language_bidi as LANGUAGE_BIDI %}
   3 : <html lang="{{ LANGUAGE_CODE|default:"en-us" }}" {% if LANGUAGE_BIDI %}dir="rtl"{% endif %}>
   4 : <head>
   5 : <title>{% block title %}{% endblock %} | ArchiveBox</title>
   6 : <link rel="stylesheet" type="text/css" href="{% block stylesheet %}{% static "admin/css/base.css" %}{% endblock %}">
   7 : {% block extrastyle %}{% endblock %}
   8 : {% if LANGUAGE_BIDI %}<link rel="stylesheet" type="text/css" href="{% block stylesheet_rtl %}{% static "admin/css/rtl.css" %}{% endblock %}">{% endif %}
   9 : {% block extrahead %}{% endblock %}
   10 : {% block responsive %}


Traceback (most recent call last):
  File "/home/winter/.local/lib/python3.9/site-packages/django/db/models/options.py", line 575, in get_field
    return self.fields_map[field_name]

During handling of the above exception ('files'), another exception occurred:
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 265, in lookup_field
    f = _get_non_gfk_field(opts, name)
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 296, in _get_non_gfk_field
    field = opts.get_field(name)
  File "/home/winter/.local/lib/python3.9/site-packages/django/db/models/options.py", line 577, in get_field
    raise FieldDoesNotExist("%s has no field named '%s'" % (self.object_name, field_name))

During handling of the above exception (Snapshot has no field named 'files'), another exception occurred:
  File "/home/winter/.local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/home/winter/.local/lib/python3.9/site-packages/django/core/handlers/base.py", line 202, in _get_response
    response = response.render()
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/response.py", line 105, in render
    self.content = self.rendered_content
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/response.py", line 83, in rendered_content
    return template.render(context, self._request)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/backends/django.py", line 61, in render
    return self.template.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 170, in render
    return self._render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render
    return self.nodelist.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render
    bit = node.render_annotated(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated
    return self.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 150, in render
    return compiled_parent._render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render
    return self.nodelist.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render
    bit = node.render_annotated(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated
    return self.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 150, in render
    return compiled_parent._render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render
    return self.nodelist.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render
    bit = node.render_annotated(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated
    return self.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 62, in render
    result = block.nodelist.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render
    bit = node.render_annotated(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated
    return self.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 62, in render
    result = block.nodelist.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render
    bit = node.render_annotated(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated
    return self.render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/base.py", line 33, in render
    return super().render(context)
  File "/home/winter/.local/lib/python3.9/site-packages/django/template/library.py", line 214, in render
    _dict = self.func(*resolved_args, **resolved_kwargs)
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 341, in result_list
    'results': list(results(cl)),
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 317, in results
    yield ResultList(None, items_for_result(cl, res, None))
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 308, in __init__
    super().__init__(*items)
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 233, in items_for_result
    f, attr, value = lookup_field(field_name, result, cl.model_admin)
  File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 274, in lookup_field
    value = attr(obj)
  File "/home/winter/.local/lib/python3.9/site-packages/archivebox/core/admin.py", line 140, in files
    return snapshot_icons(obj)
  File "/home/winter/.local/lib/python3.9/site-packages/archivebox/index/html.py", line 164, in snapshot_icons
    return format_html(f'<span class="files-icons" style="font-size: 1.1em; opacity: 0.8">{output}<span>')
  File "/home/winter/.local/lib/python3.9/site-packages/django/utils/html.py", line 115, in format_html
    return mark_safe(format_string.format(*args_safe, **kwargs_safe))

Exception Type: KeyError at /admin/core/snapshot/
Exception Value: '21005575-02D4-D4B5-4572-D2005CAF9866'

I assume this has something to do with something I archived? I took my original archive, dumped all the URLs to a text file, then created a new collection in a new directory, and imported the URLs into the new collection. It's just not super clear to me how that might effect the rendering of the admin page?

I did try removing everything via pip then reinstalling (included removing stuff from the python site-packages dir) but the issue persists.

<!-- gh-comment-id:761162170 --> @winteriscariot commented on GitHub (Jan 15, 2021): Hm the issue has started to recur. thanks for that @mAAdhaTTah -- i was able to get more info when loading /admin/core/snapshot/. Django traceback output: ``` Environment: Request Method: GET Request URL: http://archive.local/admin/core/snapshot/ Django Version: 3.1.3 Python Version: 3.9.1 Installed Applications: ['django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django.contrib.admin', 'core', 'django_extensions'] Installed Middleware: ['django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware'] Template error: In template /home/winter/.local/lib/python3.9/site-packages/archivebox/themes/admin/base.html, error at line 0 21005575-02D4-D4B5-4572-D2005CAF9866 1 : {% load i18n static %}<!DOCTYPE html> 2 : {% get_current_language as LANGUAGE_CODE %}{% get_current_language_bidi as LANGUAGE_BIDI %} 3 : <html lang="{{ LANGUAGE_CODE|default:"en-us" }}" {% if LANGUAGE_BIDI %}dir="rtl"{% endif %}> 4 : <head> 5 : <title>{% block title %}{% endblock %} | ArchiveBox</title> 6 : <link rel="stylesheet" type="text/css" href="{% block stylesheet %}{% static "admin/css/base.css" %}{% endblock %}"> 7 : {% block extrastyle %}{% endblock %} 8 : {% if LANGUAGE_BIDI %}<link rel="stylesheet" type="text/css" href="{% block stylesheet_rtl %}{% static "admin/css/rtl.css" %}{% endblock %}">{% endif %} 9 : {% block extrahead %}{% endblock %} 10 : {% block responsive %} Traceback (most recent call last): File "/home/winter/.local/lib/python3.9/site-packages/django/db/models/options.py", line 575, in get_field return self.fields_map[field_name] During handling of the above exception ('files'), another exception occurred: File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 265, in lookup_field f = _get_non_gfk_field(opts, name) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 296, in _get_non_gfk_field field = opts.get_field(name) File "/home/winter/.local/lib/python3.9/site-packages/django/db/models/options.py", line 577, in get_field raise FieldDoesNotExist("%s has no field named '%s'" % (self.object_name, field_name)) During handling of the above exception (Snapshot has no field named 'files'), another exception occurred: File "/home/winter/.local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/home/winter/.local/lib/python3.9/site-packages/django/core/handlers/base.py", line 202, in _get_response response = response.render() File "/home/winter/.local/lib/python3.9/site-packages/django/template/response.py", line 105, in render self.content = self.rendered_content File "/home/winter/.local/lib/python3.9/site-packages/django/template/response.py", line 83, in rendered_content return template.render(context, self._request) File "/home/winter/.local/lib/python3.9/site-packages/django/template/backends/django.py", line 61, in render return self.template.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 170, in render return self._render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render return self.nodelist.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render bit = node.render_annotated(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated return self.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 150, in render return compiled_parent._render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render return self.nodelist.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render bit = node.render_annotated(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated return self.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 150, in render return compiled_parent._render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 162, in _render return self.nodelist.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render bit = node.render_annotated(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated return self.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 62, in render result = block.nodelist.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render bit = node.render_annotated(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated return self.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/loader_tags.py", line 62, in render result = block.nodelist.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 938, in render bit = node.render_annotated(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/base.py", line 905, in render_annotated return self.render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/base.py", line 33, in render return super().render(context) File "/home/winter/.local/lib/python3.9/site-packages/django/template/library.py", line 214, in render _dict = self.func(*resolved_args, **resolved_kwargs) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 341, in result_list 'results': list(results(cl)), File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 317, in results yield ResultList(None, items_for_result(cl, res, None)) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 308, in __init__ super().__init__(*items) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/templatetags/admin_list.py", line 233, in items_for_result f, attr, value = lookup_field(field_name, result, cl.model_admin) File "/home/winter/.local/lib/python3.9/site-packages/django/contrib/admin/utils.py", line 274, in lookup_field value = attr(obj) File "/home/winter/.local/lib/python3.9/site-packages/archivebox/core/admin.py", line 140, in files return snapshot_icons(obj) File "/home/winter/.local/lib/python3.9/site-packages/archivebox/index/html.py", line 164, in snapshot_icons return format_html(f'<span class="files-icons" style="font-size: 1.1em; opacity: 0.8">{output}<span>') File "/home/winter/.local/lib/python3.9/site-packages/django/utils/html.py", line 115, in format_html return mark_safe(format_string.format(*args_safe, **kwargs_safe)) Exception Type: KeyError at /admin/core/snapshot/ Exception Value: '21005575-02D4-D4B5-4572-D2005CAF9866' ``` I assume this has something to do with something I archived? I took my original archive, dumped all the URLs to a text file, then created a new collection in a new directory, and imported the URLs into the new collection. It's just not super clear to me how that might effect the rendering of the admin page? I did try removing everything via pip then reinstalling (included removing stuff from the python site-packages dir) but the issue persists.
Author
Owner

@winteriscariot commented on GitHub (Jan 15, 2021):

A few things:

  1. The login page loads just fine
  2. Once logging in with a proper user, that's when the 500 error triggers. Note: this is only for /admin/core/snapshot/
  3. If I go to the /admin URI it loads just fine, and I'm logged in with my user.
  4. The public feed loads
<!-- gh-comment-id:761165554 --> @winteriscariot commented on GitHub (Jan 15, 2021): A few things: 1. The login page loads just fine 2. Once logging in with a proper user, that's when the 500 error triggers. Note: this is only for /admin/core/snapshot/ 3. If I go to the /admin URI it loads just fine, and I'm logged in with my user. 4. The public feed loads
Author
Owner

@dohlin commented on GitHub (Jan 15, 2021):

I too am seeing the issue the moment my Chrome .html bookmarks file gets written to the database (starting a fresh install). For me, it seems to be only the /public URI that throws the error 500; the admin section seems to work fine (from what I've tested). There's clearly something going on here.

EDIT: Even spun up a brand new Ubuntu 20.04 server to test setup from scratch, same issue. Tried an old html bookmarks backup file I had laying around from several months back and same issue. Everything works until I start the initial archive.

<!-- gh-comment-id:761180287 --> @dohlin commented on GitHub (Jan 15, 2021): I too am seeing the issue the moment my Chrome .html bookmarks file gets written to the database (starting a fresh install). For me, it seems to be only the /public URI that throws the error 500; the admin section seems to work fine (from what I've tested). There's clearly something going on here. EDIT: Even spun up a brand new Ubuntu 20.04 server to test setup from scratch, same issue. Tried an old html bookmarks backup file I had laying around from several months back and same issue. Everything works until I start the initial archive.
Author
Owner

@pirate commented on GitHub (Jan 16, 2021):

Yup, definitely a valid bug, we'll look into it, thanks for reporting.

If you're able to narrow it down to a specific link that breaks that would help us a ton.

<!-- gh-comment-id:761449959 --> @pirate commented on GitHub (Jan 16, 2021): Yup, definitely a valid bug, we'll look into it, thanks for reporting. If you're able to narrow it down to a specific link that breaks that would help us a ton.
Author
Owner

@aspensmonster commented on GitHub (Jan 23, 2021):

I have the same issue @dohlin, which I suspect might be different from the issue @winteriscariot is experiencing. In my case, after enabling Django debug output, I get a KeyError exception:

Internal Server Error: /public/
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.9/dist-packages/django/core/handlers/base.py", line 179, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/django/views/generic/base.py", line 70, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/django/views/generic/base.py", line 98, in dispatch
    return handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/archivebox/core/views.py", line 117, in get
    response = super().get(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/django/views/generic/list.py", line 142, in get
    self.object_list = self.get_queryset()
  File "/usr/local/lib/python3.9/dist-packages/archivebox/core/views.py", line 112, in get_queryset
    snapshot.icons = snapshot_icons(snapshot)
  File "/usr/local/lib/python3.9/dist-packages/archivebox/index/html.py", line 164, in snapshot_icons
    return format_html(f'<span class="files-icons" style="font-size: 1.1em; opacity: 0.8">{output}<span>')
  File "/usr/local/lib/python3.9/dist-packages/django/utils/html.py", line 115, in format_html
    return mark_safe(format_string.format(*args_safe, **kwargs_safe))
KeyError: '1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3'
[23/Jan/2021 22:01:46] "GET /public/ HTTP/1.1" 500 104109

The Django error page that is shown has lots of helpful information. I've uploaded it here (you'll probably need to download it and view it in a browser though).

So far as I can tell, the problem is with a URL that has curly braces in it ({ and }). I know nothing about Django or its templating engine, but I'm assuming it uses those characters for template expansion, and complains about not having an actual variable called 1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3 to populate from.

The offending link itself:

http://learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-US

And the braces showing up in the wget_path property of canon:

{'archive_org_path': 'https://web.archive.org/web/learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-US',
 'dom_path': 'output.html',
 'favicon_path': 'favicon.ico',
 'git_path': 'git',
 'google_favicon_path': 'https://www.google.com/s2/favicons?domain=learning.microsoft.com',
 'index_path': 'index.html',
 'media_path': 'media',
 'mercury_path': 'mercury/content.html',
 'pdf_path': 'output.pdf',
 'readability_path': 'readability/content.html',
 'screenshot_path': 'screenshot.png',
 'singlefile_path': 'singlefile.html',
 'warc_path': 'warc',
 'wget_path': 'learning.microsoft.com/manager/LearningPlanV2.aspx@resourceId={1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3}&clang=en-US.html'}

And it does look like local var output, which gets pushed into the format_html function in archivebox/index/html.py, has raw curly braces in it:

'<a ' 'href="/archive/1577435751.7223/learning.microsoft.com/manager/LearningPlanV2.aspx@resourceId={1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3}&clang=en-US.html" ' 'class="exists-False" title="wget">🆆 </a><a '

Presumably, format_html eventually triggers template expansion and causes Django to barf.

Full disclosure, I'm in the process of migrating backups from early 2019 into version 0.5.3. Presently, all of my datadirs are "invalid", as it looks like the datadir index.json format has changed significantly.

That being said, I did start an empty archive in a new directory

archivebox init

enabled DEBUG

archivebox config --set DEBUG=True

added the single offending link to the archive

archivebox add http://learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-U

and got the same HTTP 500 error and KeyError exception Django error output. So I suspect that my efforts of porting old backups to 0.5.3 are unrelated to this bug.

EDIT (2021-01-23T16:51:00-06:00): I've also included the index.json (path archive/1611442069.323929/index.json) from the test case here (again, you'll need to download the file). The curly braces are present in the output here too.

<!-- gh-comment-id:766190177 --> @aspensmonster commented on GitHub (Jan 23, 2021): I have the same issue @dohlin, which I suspect might be different from the issue @winteriscariot is experiencing. In my case, after enabling Django debug output, I get a KeyError exception: ``` Internal Server Error: /public/ Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/usr/local/lib/python3.9/dist-packages/django/core/handlers/base.py", line 179, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/local/lib/python3.9/dist-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/django/views/generic/base.py", line 98, in dispatch return handler(request, *args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/archivebox/core/views.py", line 117, in get response = super().get(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/django/views/generic/list.py", line 142, in get self.object_list = self.get_queryset() File "/usr/local/lib/python3.9/dist-packages/archivebox/core/views.py", line 112, in get_queryset snapshot.icons = snapshot_icons(snapshot) File "/usr/local/lib/python3.9/dist-packages/archivebox/index/html.py", line 164, in snapshot_icons return format_html(f'<span class="files-icons" style="font-size: 1.1em; opacity: 0.8">{output}<span>') File "/usr/local/lib/python3.9/dist-packages/django/utils/html.py", line 115, in format_html return mark_safe(format_string.format(*args_safe, **kwargs_safe)) KeyError: '1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3' [23/Jan/2021 22:01:46] "GET /public/ HTTP/1.1" 500 104109 ``` The Django error page that is shown has lots of helpful information. I've uploaded it [here](https://share.riseup.net/#D58KDtAp6D-JqxO7-eJlMA) (you'll probably need to download it and view it in a browser though). So far as I can tell, the problem is with a URL that has curly braces in it (`{` and `}`). I know nothing about Django or its templating engine, but I'm assuming it uses those characters for template expansion, and complains about not having an actual variable called `1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3` to populate from. The offending link itself: http://learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-US And the braces showing up in the `wget_path` property of `canon`: ``` {'archive_org_path': 'https://web.archive.org/web/learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-US', 'dom_path': 'output.html', 'favicon_path': 'favicon.ico', 'git_path': 'git', 'google_favicon_path': 'https://www.google.com/s2/favicons?domain=learning.microsoft.com', 'index_path': 'index.html', 'media_path': 'media', 'mercury_path': 'mercury/content.html', 'pdf_path': 'output.pdf', 'readability_path': 'readability/content.html', 'screenshot_path': 'screenshot.png', 'singlefile_path': 'singlefile.html', 'warc_path': 'warc', 'wget_path': 'learning.microsoft.com/manager/LearningPlanV2.aspx@resourceId={1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3}&clang=en-US.html'} ``` And it does look like local var `output`, which gets pushed into the `format_html` function in `archivebox/index/html.py`, has raw curly braces in it: `'<a ' 'href="/archive/1577435751.7223/learning.microsoft.com/manager/LearningPlanV2.aspx@resourceId={1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3}&clang=en-US.html" ' 'class="exists-False" title="wget">🆆 </a><a '` Presumably, `format_html` eventually triggers template expansion and causes Django to barf. Full disclosure, I'm in the process of migrating backups from early 2019 into version 0.5.3. Presently, all of my datadirs are "invalid", as it looks like the datadir index.json format has changed significantly. That being said, I *did* start an empty archive in a new directory `archivebox init` enabled `DEBUG` `archivebox config --set DEBUG=True` added the single offending link to the archive `archivebox add http://learning.microsoft.com/manager/LearningPlanV2.aspx?resourceId=%7b1f5a0aab-2088-4ecc-84e1-6eaa4de7d6c3%7d&clang=en-U` and got the same HTTP 500 error and KeyError exception Django error output. So I suspect that my efforts of porting old backups to 0.5.3 are unrelated to this bug. EDIT (2021-01-23T16:51:00-06:00): I've also included the index.json (path `archive/1611442069.323929/index.json`) from the test case [here](https://share.riseup.net/#lALRoOqWLwelN-jW6A_zxw) (again, you'll need to download the file). The curly braces are present in the output here too.
Author
Owner

@pirate commented on GitHub (Jan 23, 2021):

Supremely helpful, thank you @aggroskater. I suspect your diagnosis is correct with the curly braces, that template rendering code is being improved anyway in another branch I'm working on, so hopefully that will clear up some of these issues as well (canon is getting ripped out completely, it's always been a source of problems). If anyone wants to submit a quick patch for this I'll approve and merge it in time for v0.5.4, otherwise expect a few weeks until I have my next chunk of free time for AB development.

If you're able to post one of the ./archive/<timestamp/index.json files that's being flagged as "invalid" I can help you get it into v0.5.3. Alternatively you can do a 2-step migration through one of the v0.4 versions to get it into v0.5. Everything from v0.4 and up has a sqlitedb + rollback-safe migrations system to avoid upgrading pains in the future, so once you get to v0.4 it should be easier from then on to do upgrades.

<!-- gh-comment-id:766194857 --> @pirate commented on GitHub (Jan 23, 2021): Supremely helpful, thank you @aggroskater. I suspect your diagnosis is correct with the curly braces, that template rendering code is being improved anyway in another branch I'm working on, so hopefully that will clear up some of these issues as well (`canon` is getting ripped out completely, it's always been a source of problems). If anyone wants to submit a quick patch for this I'll approve and merge it in time for v0.5.4, otherwise expect a few weeks until I have my next chunk of free time for AB development. If you're able to post one of the `./archive/<timestamp/index.json` files that's being flagged as "invalid" I can help you get it into v0.5.3. Alternatively you can do a 2-step migration through one of the v0.4 versions to get it into v0.5. Everything from v0.4 and up has a sqlitedb + rollback-safe migrations system to avoid upgrading pains in the future, so once you get to v0.4 it should be easier from then on to do upgrades.
Author
Owner

@pirate commented on GitHub (Feb 1, 2021):

v0.5.4 is released, please give it a try. Report back here if you have any further issues and I can reopen the ticket.

<!-- gh-comment-id:770729517 --> @pirate commented on GitHub (Feb 1, 2021): v0.5.4 is released, please give it a try. Report back here if you have any further issues and I can reopen the ticket.
Author
Owner

@berezovskyi commented on GitHub (Jun 6, 2021):

I just got a similar error with a URL https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65:

  File "/app/archivebox/index/schema.py", line 427, in canonical_outputs
    'wget_path': wget_output_path(self),
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/wget.py", line 170, in wget_output_path
    if search_dir.exists():
  File "/usr/local/lib/python3.9/pathlib.py", line 1414, in exists
    self.stat()
  File "/usr/local/lib/python3.9/pathlib.py", line 1222, in stat
    return self._accessor.stat(self)
OSError: [Errno 36] File name too long: '/data/archive/1622409932.315706/link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65'

Here is how to fix the system without losing the index:

  1. Find the offending URL bits in the logs (I did not realise the complete URL except the scheme was in a traceback).
  2. Copy the SQlite DB into a new file and into the folder with write permissions (for docker install, only dir up will be good, see why).
  3. Run sqlite3 %filename% and then the following query: select url, added from core_snapshot order by added desc limit 10;. You should see the full URL now.
  4. Run docker exec -it -u archivebox archivebox archivebox remove 'https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65' where my URL is replaced by the URL that is causing errors on your system.
<!-- gh-comment-id:855475621 --> @berezovskyi commented on GitHub (Jun 6, 2021): I just got a similar error with a URL `https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65`: ``` File "/app/archivebox/index/schema.py", line 427, in canonical_outputs 'wget_path': wget_output_path(self), File "/app/archivebox/util.py", line 114, in typechecked_function return func(*args, **kwargs) File "/app/archivebox/extractors/wget.py", line 170, in wget_output_path if search_dir.exists(): File "/usr/local/lib/python3.9/pathlib.py", line 1414, in exists self.stat() File "/usr/local/lib/python3.9/pathlib.py", line 1222, in stat return self._accessor.stat(self) OSError: [Errno 36] File name too long: '/data/archive/1622409932.315706/link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65' ``` Here is how to fix the system without losing the index: 1. Find the offending URL bits in the logs (I did not realise the complete URL except the scheme was in a traceback). 2. Copy the SQlite DB into a new file **and into the folder with write permissions** (for docker install, only dir up will be good, see [why](https://stackoverflow.com/questions/3319112/sqlite-error-attempt-to-write-a-readonly-database-during-insert)). 3. Run `sqlite3 %filename%` and then the following query: `select url, added from core_snapshot order by added desc limit 10;`. You should see the full URL now. 4. Run `docker exec -it -u archivebox archivebox archivebox remove 'https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65'` where my URL is replaced by the URL that is causing errors on your system.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#382
No description provided.