[GH-ISSUE #1373] Bug: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed when trying to render unprintable filesystem path in view #3860

Closed
opened 2026-03-15 00:42:36 +03:00 by kerem · 17 comments
Owner

Originally created by @Finkregh on GitHub (Mar 6, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1373

Describe the bug

Access impossible due to unicode issue

Steps to reproduce

I dont know what exactly changed/happened.

Screenshots or log output

Mar 06 19:18:07 archivebox archivebox[864]: "GET /public/ HTTP/1.1" 500 145
Mar 06 19:29:04 archivebox archivebox[864]: Internal Server Error: /public/
Mar 06 19:29:04 archivebox archivebox[864]: Traceback (most recent call last):
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:29:04 archivebox archivebox[864]:     response = get_response(request)
Mar 06 19:29:04 archivebox archivebox[864]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_resp>
Mar 06 19:29:04 archivebox archivebox[864]:     response = response.render()
Mar 06 19:29:04 archivebox archivebox[864]:                ^^^^^^^^^^^^^^^^^
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render
Mar 06 19:29:04 archivebox archivebox[864]:     self.content = self.rendered_content
Mar 06 19:29:04 archivebox archivebox[864]:     ^^^^^^^^^^^^
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content
Mar 06 19:29:04 archivebox archivebox[864]:     HttpResponse.content.fset(self, value)
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:29:04 archivebox archivebox[864]:     content = self.make_bytes(value)
Mar 06 19:29:04 archivebox archivebox[864]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:29:04 archivebox archivebox[864]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:29:04 archivebox archivebox[864]:     return bytes(value.encode(self.charset))
Mar 06 19:29:04 archivebox archivebox[864]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:29:04 archivebox archivebox[864]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed
Mar 06 19:29:04 archivebox archivebox[864]: "GET /public/ HTTP/1.1" 500 145

ArchiveBox version

0.7.2
ArchiveBox v0.7.2 BUILD_TIME=2024-03-06 19:08:32 1709752112
IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.7.8-arch1-1-x86_64-with-glibc2.39 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=1000:1000 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.8         valid     /usr/bin/python3.11                                                         
 √  SQLITE_BINARY         v2.6.0          valid     /usr/lib/python3.11/sqlite3/dbapi2.py                                       
 √  DJANGO_BINARY         v3.1.14         valid     /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /home/ol/.local/pipx/venvs/archivebox/bin/archivebox                        

 √  CURL_BINARY           v8.6.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.4         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v21.6.2         valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.49         valid     ./node_modules/single-file-cli/single-file                                  
 √  READABILITY_BINARY    v0.0.11         valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/parser/cli.js                                     
 √  GIT_BINARY            v2.44.0         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/bin/yt-dlp                                                             
 √  CHROME_BINARY         v122.0.6261.111  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v14.1.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            8 files @       valid     /home/ol/data                                                               
 √  SOURCES_DIR           48 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           1683 files      valid     ./archive                                                                   
 √  CONFIG_FILE           149.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             27.1 MB         valid     ./index.sqlite3    

logs with DEBUG=True

Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response.render()
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = self.rendered_content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     HttpResponse.content.fset(self, value)
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/middleware.py", line 32, in middleware
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6662: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response or self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/middleware.py", line 25, in middleware
Mar 06 19:39:25 archivebox archivebox[2001]:     return get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred:
Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last):
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/usr/lib/python3.11/wsgiref/handlers.py", line 137, in run
Mar 06 19:39:25 archivebox archivebox[2001]:     self.result = application(self.environ, self.start_response)
Mar 06 19:39:25 archivebox archivebox[2001]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/contrib/staticfiles/handlers.py", line 76, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     return self.application(environ, start_response)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/wsgi.py", line 133, in __call__
Mar 06 19:39:25 archivebox archivebox[2001]:     response = self.get_response(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 130, in get_response
Mar 06 19:39:25 archivebox archivebox[2001]:     response = self._middleware_chain(request)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner
Mar 06 19:39:25 archivebox archivebox[2001]:     response = response_for_exception(request, exc)
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Mar 06 19:39:25 archivebox archivebox[2001]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception
Mar 06 19:39:25 archivebox archivebox[2001]:     return debug.technical_500_response(request, *exc_info)
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response
Mar 06 19:39:25 archivebox archivebox[2001]:     return HttpResponse(html, status=status_code, content_type='text/html')
Mar 06 19:39:25 archivebox archivebox[2001]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__
Mar 06 19:39:25 archivebox archivebox[2001]:     self.content = content
Mar 06 19:39:25 archivebox archivebox[2001]:     ^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
Mar 06 19:39:25 archivebox archivebox[2001]:     content = self.make_bytes(value)
Mar 06 19:39:25 archivebox archivebox[2001]:               ^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]:   File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
Mar 06 19:39:25 archivebox archivebox[2001]:     return bytes(value.encode(self.charset))
Mar 06 19:39:25 archivebox archivebox[2001]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed
Mar 06 19:39:25 archivebox archivebox[2001]: "GET /public/ HTTP/1.1" 500 59

Originally created by @Finkregh on GitHub (Mar 6, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1373 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug Access impossible due to unicode issue #### Steps to reproduce I dont know what exactly changed/happened. #### Screenshots or log output ```logs Mar 06 19:18:07 archivebox archivebox[864]: "GET /public/ HTTP/1.1" 500 145 Mar 06 19:29:04 archivebox archivebox[864]: Internal Server Error: /public/ Mar 06 19:29:04 archivebox archivebox[864]: Traceback (most recent call last): Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:29:04 archivebox archivebox[864]: response = get_response(request) Mar 06 19:29:04 archivebox archivebox[864]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_resp> Mar 06 19:29:04 archivebox archivebox[864]: response = response.render() Mar 06 19:29:04 archivebox archivebox[864]: ^^^^^^^^^^^^^^^^^ Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render Mar 06 19:29:04 archivebox archivebox[864]: self.content = self.rendered_content Mar 06 19:29:04 archivebox archivebox[864]: ^^^^^^^^^^^^ Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content Mar 06 19:29:04 archivebox archivebox[864]: HttpResponse.content.fset(self, value) Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:29:04 archivebox archivebox[864]: content = self.make_bytes(value) Mar 06 19:29:04 archivebox archivebox[864]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:29:04 archivebox archivebox[864]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:29:04 archivebox archivebox[864]: return bytes(value.encode(self.charset)) Mar 06 19:29:04 archivebox archivebox[864]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:29:04 archivebox archivebox[864]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed Mar 06 19:29:04 archivebox archivebox[864]: "GET /public/ HTTP/1.1" 500 145 ``` #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs 0.7.2 ArchiveBox v0.7.2 BUILD_TIME=2024-03-06 19:08:32 1709752112 IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.7.8-arch1-1-x86_64-with-glibc2.39 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=1000:1000 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.8 valid /usr/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /home/ol/.local/pipx/venvs/archivebox/bin/archivebox √ CURL_BINARY v8.6.0 valid /usr/bin/curl √ WGET_BINARY v1.21.4 valid /usr/bin/wget √ NODE_BINARY v21.6.2 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.49 valid ./node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid ./node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.44.0 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/bin/yt-dlp √ CHROME_BINARY v122.0.6261.111 valid /usr/bin/chromium √ RIPGREP_BINARY v14.1.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox √ TEMPLATES_DIR 3 files valid /home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 8 files @ valid /home/ol/data √ SOURCES_DIR 48 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1683 files valid ./archive √ CONFIG_FILE 149.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 27.1 MB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue --> #### logs with DEBUG=True ```logs Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response Mar 06 19:39:25 archivebox archivebox[2001]: response = response.render() Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render Mar 06 19:39:25 archivebox archivebox[2001]: self.content = self.rendered_content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content Mar 06 19:39:25 archivebox archivebox[2001]: HttpResponse.content.fset(self, value) Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/middleware.py", line 32, in middleware Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6662: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/utils/deprecation.py", line 114, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = response or self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/middleware.py", line 25, in middleware Mar 06 19:39:25 archivebox archivebox[2001]: return get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: During handling of the above exception, another exception occurred: Mar 06 19:39:25 archivebox archivebox[2001]: Traceback (most recent call last): Mar 06 19:39:25 archivebox archivebox[2001]: File "/usr/lib/python3.11/wsgiref/handlers.py", line 137, in run Mar 06 19:39:25 archivebox archivebox[2001]: self.result = application(self.environ, self.start_response) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/contrib/staticfiles/handlers.py", line 76, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: return self.application(environ, start_response) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/wsgi.py", line 133, in __call__ Mar 06 19:39:25 archivebox archivebox[2001]: response = self.get_response(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 130, in get_response Mar 06 19:39:25 archivebox archivebox[2001]: response = self._middleware_chain(request) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 49, in inner Mar 06 19:39:25 archivebox archivebox[2001]: response = response_for_exception(request, exc) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 103, in response_for_exception Mar 06 19:39:25 archivebox archivebox[2001]: response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info()) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 138, in handle_uncaught_exception Mar 06 19:39:25 archivebox archivebox[2001]: return debug.technical_500_response(request, *exc_info) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/views/debug.py", line 53, in technical_500_response Mar 06 19:39:25 archivebox archivebox[2001]: return HttpResponse(html, status=status_code, content_type='text/html') Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 298, in __init__ Mar 06 19:39:25 archivebox archivebox[2001]: self.content = content Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content Mar 06 19:39:25 archivebox archivebox[2001]: content = self.make_bytes(value) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes Mar 06 19:39:25 archivebox archivebox[2001]: return bytes(value.encode(self.charset)) Mar 06 19:39:25 archivebox archivebox[2001]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 06 19:39:25 archivebox archivebox[2001]: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 6658: surrogates not allowed Mar 06 19:39:25 archivebox archivebox[2001]: "GET /public/ HTTP/1.1" 500 59 ```
kerem 2026-03-15 00:42:36 +03:00
Author
Owner

@pirate commented on GitHub (Mar 6, 2024):

Looks like you archived a URL that contains unprintable UTF-8 bytes (possibly from a broken emoji/accented character/crylic/chinese/arabic/etc.) and it ended up in a filesystem path, so it's failing when trying to render the path in the public view.

In the short term you can find/strip all special UTF-8 characters in filenames using this script I wrote: strip_bad_filename_characters.sh or a program like detox (apt install detox; man detox).

In the long term ArchiveBox should fix this by force-normalizing all filenames to UTF-8 form-D on creation so this doesn't happen in the future.

<!-- gh-comment-id:1982031405 --> @pirate commented on GitHub (Mar 6, 2024): Looks like you archived a URL that contains unprintable UTF-8 bytes (possibly from a broken emoji/accented character/crylic/chinese/arabic/etc.) and it ended up in a filesystem path, so it's failing when trying to render the path in the public view. In the short term you can find/strip all special UTF-8 characters in filenames using this script I wrote: [`strip_bad_filename_characters.sh`](https://gist.github.com/pirate/e27ba40a267af62b5d8447f8892d73c6#file-strip_bad_filename_characters-sh) or a program like [`detox`](https://github.com/dharple/detox) (`apt install detox; man detox`). In the long term ArchiveBox should fix this by force-normalizing all filenames to UTF-8 form-D on creation so this doesn't happen in the future.
Author
Owner

@Finkregh commented on GitHub (Mar 7, 2024):

Additional thoughts on this:

the char in question is https://www.unicodepedia.com/unicode/low-surrogates/dcf6/trail-surrogate-dcf6/

If i look a files/folders and grep through the list for non-ascii:

[ol@archivebox ~]$ find data > files.txt
[ol@archivebox ~]$ grep --color='auto' -P -n '[^\x00-\x7F]' files.txt 
16979:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].webp
16980:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].webm
16981:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].description
16982:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].info.json
39923:data/archive/1400350483.0/twibbon.com/Support/fem-weltverschwörung-ev.html
40841:data/archive/1518425808.0/media/This is how the world’s most covetable cameras get made [hasselblad-camera-factory-tour].description
40842:data/archive/1518425808.0/media/This is how the world’s most covetable cameras get made [hasselblad-camera-factory-tour].info.json
46349:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].description
46350:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].jpg
46351:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].info.json
53211:data/archive/1556604197.0/www.slidescarnival.com/wp-content/uploads/2022/07/Blue-and-Pink-Geometric-Biography-About-Me-Creative-Presentation-·-SlidesCarnival-400x225.png
62502:data/archive/1527889634.0/3.bp.blogspot.com/_KihkJmE-KGc/TLA3WRTDG5I/AAAAAAAAAzU/JxyxTomjkd8/s320/Brühpulver.jpg
62521:data/archive/1527889634.0/2.bp.blogspot.com/-alQ8oPZliZU/WrYhwd21x3I/AAAAAAAARYw/ui6C8QPnzX81o2ZKrrCTJsL7NoXt0_c2ACLcBGAs/w72-h72-p-k-no-nu/gulasch+mälzer.jpg
62591:data/archive/1527889634.0/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7JgYIInAKOeRrRm0QjTUZyNrV4GodcX9tNfkMk5mvvtNCRcto0jESJjqVJP2dcZ5zo2C9ydTQeDDKhWX5AW7v35Iw19Yh-547FLU45ZasSsLubAWUf6jTa6lm_lMPMCAPdgUVH0bPCtGGt8buVSRMyFhA7LQ8q6_muOM_v5MmSihCgRq9YFMPz65cOg/w72-h72-p-k-no-nu/möhendurcheinander.jpg
86166:data/archive/1534075988.0/media/You Have Control: Learning From Aviation -  Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].webp
86167:data/archive/1534075988.0/media/You Have Control: Learning From Aviation -  Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].webm
86168:data/archive/1534075988.0/media/You Have Control: Learning From Aviation -  Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].description
86169:data/archive/1534075988.0/media/You Have Control: Learning From Aviation -  Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].info.json
91908:data/archive/1508142870.0/assets-global.website-files.com/61027bb0bc31fc6cafefbc0c/627d31f972023bb238b8124d_Ресурс 4.png
91926:data/archive/1508142870.0/assets-global.website-files.com/61027bb0bc31fc6cafefbc0c/618d3dfd1c3709ebafd0eb93_сhecker.svg
100214:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].webp
100215:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].info.json
100216:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].webm
100217:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].description
105632:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].webm
105633:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].description
105634:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].webp
105635:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].info.json

I cant see that char.

If i move the folders in question away I still get the same issue:

mkdir broken-data
grep --color='auto' -P -n '[^\x00-\x7F]' files.txt | cut -d ":" -f2 | cut -d "/" -f 1-3 | sort -u | xargs mv -t broken-data

I´d rather not run detox as it would rename all sorts of files and then the archive would be broken. Same with the script you linked.

Is there any way to narrow this down to where the actual files is? perhaps even more debug than DEBUG=True?

<!-- gh-comment-id:1983147472 --> @Finkregh commented on GitHub (Mar 7, 2024): Additional thoughts on this: the char in question is https://www.unicodepedia.com/unicode/low-surrogates/dcf6/trail-surrogate-dcf6/ If i look a files/folders and grep through the list for non-ascii: ```text [ol@archivebox ~]$ find data > files.txt [ol@archivebox ~]$ grep --color='auto' -P -n '[^\x00-\x7F]' files.txt 16979:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].webp 16980:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].webm 16981:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].description 16982:data/archive/1505464569.0/media/GopherCon 2017: Fatih Arslan - Writing a Go Tool to Parse and Modify Struct Tags [T4AIQ4RHp-c].info.json 39923:data/archive/1400350483.0/twibbon.com/Support/fem-weltverschwörung-ev.html 40841:data/archive/1518425808.0/media/This is how the world’s most covetable cameras get made [hasselblad-camera-factory-tour].description 40842:data/archive/1518425808.0/media/This is how the world’s most covetable cameras get made [hasselblad-camera-factory-tour].info.json 46349:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].description 46350:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].jpg 46351:data/archive/1500494199.0/media/"Don't run this on any system you expect to be up" they said, but we did it anyway - Hypernode [banner-{banner_id}-{type}].info.json 53211:data/archive/1556604197.0/www.slidescarnival.com/wp-content/uploads/2022/07/Blue-and-Pink-Geometric-Biography-About-Me-Creative-Presentation-·-SlidesCarnival-400x225.png 62502:data/archive/1527889634.0/3.bp.blogspot.com/_KihkJmE-KGc/TLA3WRTDG5I/AAAAAAAAAzU/JxyxTomjkd8/s320/Brühpulver.jpg 62521:data/archive/1527889634.0/2.bp.blogspot.com/-alQ8oPZliZU/WrYhwd21x3I/AAAAAAAARYw/ui6C8QPnzX81o2ZKrrCTJsL7NoXt0_c2ACLcBGAs/w72-h72-p-k-no-nu/gulasch+mälzer.jpg 62591:data/archive/1527889634.0/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7JgYIInAKOeRrRm0QjTUZyNrV4GodcX9tNfkMk5mvvtNCRcto0jESJjqVJP2dcZ5zo2C9ydTQeDDKhWX5AW7v35Iw19Yh-547FLU45ZasSsLubAWUf6jTa6lm_lMPMCAPdgUVH0bPCtGGt8buVSRMyFhA7LQ8q6_muOM_v5MmSihCgRq9YFMPz65cOg/w72-h72-p-k-no-nu/möhendurcheinander.jpg 86166:data/archive/1534075988.0/media/You Have Control: Learning From Aviation - Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].webp 86167:data/archive/1534075988.0/media/You Have Control: Learning From Aviation - Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].webm 86168:data/archive/1534075988.0/media/You Have Control: Learning From Aviation - Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].description 86169:data/archive/1534075988.0/media/You Have Control: Learning From Aviation - Andrew Godwin - PyCon Israel 2018 [d0eo3FxKQNc].info.json 91908:data/archive/1508142870.0/assets-global.website-files.com/61027bb0bc31fc6cafefbc0c/627d31f972023bb238b8124d_Ресурс 4.png 91926:data/archive/1508142870.0/assets-global.website-files.com/61027bb0bc31fc6cafefbc0c/618d3dfd1c3709ebafd0eb93_сhecker.svg 100214:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].webp 100215:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].info.json 100216:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].webm 100217:data/archive/1501095170.0/media/世界地図図法 [オーサグラフ世界地図] (16G141127) [IVuxMGxTyEg].description 105632:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].webm 105633:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].description 105634:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].webp 105635:data/archive/1506327195.0/media/documenting architecture: wireshark, plantuml and a repl [J2RGAPGFfP8].info.json ``` I cant see that char. If i move the folders in question away I still get the same issue: ```shell mkdir broken-data grep --color='auto' -P -n '[^\x00-\x7F]' files.txt | cut -d ":" -f2 | cut -d "/" -f 1-3 | sort -u | xargs mv -t broken-data ``` I´d rather not run detox as it would rename all sorts of files and then the archive would be broken. Same with the script you linked. Is there any way to narrow this down to where the actual files is? perhaps even more debug than DEBUG=True?
Author
Owner

@Finkregh commented on GitHub (Mar 17, 2024):

Interstingly i did a sqlite3 database.db '.dump' > foo.sql (besides some strace) which lead to not having the issue anymore. I wonder what that did and if something went wrong insside the sqlite file before.

I´d still be interested in getting to know how to debug this :)

edit: I also moved all archive data back which i suspected to cause issues and it still works.

<!-- gh-comment-id:2002467951 --> @Finkregh commented on GitHub (Mar 17, 2024): Interstingly i did a `sqlite3 database.db '.dump' > foo.sql` (besides some strace) which lead to not having the issue anymore. I wonder what that did and if something went wrong insside the sqlite file before. I´d still be interested in getting to know how to debug this :) edit: I also moved all archive data back which i suspected to cause issues and it still works.
Author
Owner

@Finkregh commented on GitHub (Mar 17, 2024):

Aaand its back... o_O?

I read your pretty nice upgrading documentation that explains what init does. So I ran it and everything works. Still guessing in the direction of some sqlite issue... And I tried to get django-debug-toolbar==3.2.4 to run but ran into an exception:

TypeError: CacheHandler.all() got an unexpected keyword argument 'initialized_only'

edit: restarted the server and the issue is back. now running init again w/o restart, lets see
edit2: still broken :|

<!-- gh-comment-id:2002631229 --> @Finkregh commented on GitHub (Mar 17, 2024): ~Aaand its back... o_O?~ I read your pretty nice upgrading documentation that explains what `init` does. So I ran it and everything works. Still guessing in the direction of some sqlite issue... And I tried to get `django-debug-toolbar==3.2.4` to run but ran into an exception: ``` TypeError: CacheHandler.all() got an unexpected keyword argument 'initialized_only' ``` edit: restarted the server and the issue is back. now running init again w/o restart, lets see edit2: still broken :|
Author
Owner

@pirate commented on GitHub (Mar 21, 2024):

Good approach trying to narrow down the failing request with django-debug-toolbar, not sure why it failed, I'll take a look. You can also try disabling most of the panes that it uses as they're often individually buggy and not all panes are needed to track down a broken request: archivebox/core/settings.py:165 DEBUG_TOOLBAR_PANELS (you can comment out almost everything in there, I'd start by disabling 'debug_toolbar.panels.cache.CachePanel'). There are also middlewares that can be added to log requests specifically: https://github.com/Rhumbix/django-request-logging

We can also keep trying the more direct approach to find where the offending bytes are recorded on the filesystem or in sqlite, before spelunking through the ArchiveBox code, maybe something like:

# find non-ascii within db fields
sqlite3 index.sqlite3
> SELECT * FROM core_snapshot WHERE <column> GLOB ('*[^'||char(1,45,127)||']*');
> SELECT * FROM core_archiveresult WHERE <column> GLOB ('*[^'||char(1,45,127)||']*');

# or keep trying other ways to find \udcf6 within file contents / paths
grep -obarUP "\xdc\xf6" .
<!-- gh-comment-id:2011199092 --> @pirate commented on GitHub (Mar 21, 2024): Good approach trying to narrow down the failing request with `django-debug-toolbar`, not sure why it failed, I'll take a look. You can also try disabling most of the panes that it uses as they're often individually buggy and not all panes are needed to track down a broken request: `archivebox/core/settings.py:165` `DEBUG_TOOLBAR_PANELS` (you can comment out almost everything in there, I'd start by disabling `'debug_toolbar.panels.cache.CachePanel'`). There are also middlewares that can be added to log requests specifically: https://github.com/Rhumbix/django-request-logging We can also keep trying the more direct approach to find where the offending bytes are recorded on the filesystem or in sqlite, before spelunking through the ArchiveBox code, maybe something like: ```bash # find non-ascii within db fields sqlite3 index.sqlite3 > SELECT * FROM core_snapshot WHERE <column> GLOB ('*[^'||char(1,45,127)||']*'); > SELECT * FROM core_archiveresult WHERE <column> GLOB ('*[^'||char(1,45,127)||']*'); # or keep trying other ways to find \udcf6 within file contents / paths grep -obarUP "\xdc\xf6" . ``` - https://www.unicodepedia.com/unicode/low-surrogates/dcf6/trail-surrogate-dcf6/ - https://charbase.com/dcf6-unicode-invalid-character - https://unix.stackexchange.com/questions/474709/how-to-grep-for-unicode-in-a-bash-script - https://sqlite-users.sqlite.narkive.com/rMCLvZ99/sqlite-finding-records-containing-non-ascii-characters
Author
Owner

@Finkregh commented on GitHub (Mar 26, 2024):

Tried with request-logging:

GET /
{'HTTP_HOST': 'archivebox.local:8080', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'csrftoken=x; sessionid=y; GMT_OFFSET=60', 'HTTP_UPGRADE_INSECURE_REQUESTS': '1'}
b''
GET / - 302
"GET / HTTP/1.1" 302 0
Internal Server Error: /admin/core/snapshot/
Traceback (most recent call last):
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response
    response = response.render()
               ^^^^^^^^^^^^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render
    self.content = self.rendered_content
    ^^^^^^^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content
    HttpResponse.content.fset(self, value)
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content
    content = self.make_bytes(value)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
    return bytes(value.encode(self.charset))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 92793: surrogates not allowed
GET /admin/core/snapshot/
{'HTTP_HOST': 'archivebox.local:8080', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'csrftoken=x; sessionid=y; GMT_OFFSET=60', 'HTTP_UPGRADE_INSECURE_REQUESTS': '1'}
b''
GET /admin/core/snapshot/ - 500
"GET /admin/core/snapshot/ HTTP/1.1" 500 145

Sqlite glob with non-ascii returns all sort of stuff, not that char.

I tried with this and it returned nothing:

#!/bin/bash

# SQLite database file
DATABASE="index.sqlite3"

# Dump all table names
TABLES=$(sqlite3 "$DATABASE" ".tables")

# Loop through each table
for table in $TABLES; do
    #echo "Table: $table"

    # Dump all column names for the current table
    COLUMNS=$(sqlite3 "$DATABASE" "PRAGMA table_info($table);" | cut -d '|' -f 2)

    # Loop through each column
    for column in $COLUMNS; do
        #echo "Column: $column"

        # Run the query for the current table/column combination
        #echo "Results for $table.$column:"
        sqlite3 "$DATABASE" "SELECT * FROM $table WHERE $column LIKE '%' || X'DCF6' || '%';"
    done
done

Edit: the grep did find some files, i moved them away and nothing changed :(

<!-- gh-comment-id:2021430272 --> @Finkregh commented on GitHub (Mar 26, 2024): Tried with request-logging: ```log GET / {'HTTP_HOST': 'archivebox.local:8080', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'csrftoken=x; sessionid=y; GMT_OFFSET=60', 'HTTP_UPGRADE_INSECURE_REQUESTS': '1'} b'' GET / - 302 "GET / HTTP/1.1" 302 0 Internal Server Error: /admin/core/snapshot/ Traceback (most recent call last): File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) ^^^^^^^^^^^^^^^^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response response = response.render() ^^^^^^^^^^^^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 105, in render self.content = self.rendered_content ^^^^^^^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/template/response.py", line 134, in content HttpResponse.content.fset(self, value) File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 328, in content content = self.make_bytes(value) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes return bytes(value.encode(self.charset)) ^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 92793: surrogates not allowed GET /admin/core/snapshot/ {'HTTP_HOST': 'archivebox.local:8080', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0', 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'csrftoken=x; sessionid=y; GMT_OFFSET=60', 'HTTP_UPGRADE_INSECURE_REQUESTS': '1'} b'' GET /admin/core/snapshot/ - 500 "GET /admin/core/snapshot/ HTTP/1.1" 500 145 ``` Sqlite glob with non-ascii returns all sort of stuff, not _that_ char. I tried with this and it returned nothing: ```shell #!/bin/bash # SQLite database file DATABASE="index.sqlite3" # Dump all table names TABLES=$(sqlite3 "$DATABASE" ".tables") # Loop through each table for table in $TABLES; do #echo "Table: $table" # Dump all column names for the current table COLUMNS=$(sqlite3 "$DATABASE" "PRAGMA table_info($table);" | cut -d '|' -f 2) # Loop through each column for column in $COLUMNS; do #echo "Column: $column" # Run the query for the current table/column combination #echo "Results for $table.$column:" sqlite3 "$DATABASE" "SELECT * FROM $table WHERE $column LIKE '%' || X'DCF6' || '%';" done done ``` Edit: the grep did find some files, i moved them away and nothing changed :(
Author
Owner

@pirate commented on GitHub (Mar 26, 2024):

damn... ok. I guess I might have to fix it the harder way: changing the renderer to handle this.

Before we go debugging too much further can you help double check these super quick:

echo $LANG $LC_ALL $LC_CTYPE
# should be: LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8

Related issues:

<!-- gh-comment-id:2021453204 --> @pirate commented on GitHub (Mar 26, 2024): damn... ok. I guess I might have to fix it the harder way: changing the renderer to handle this. Before we go debugging too much further can you help double check these super quick: ```bash echo $LANG $LC_ALL $LC_CTYPE # should be: LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 ``` Related issues: - https://github.com/jazzband/django-debug-toolbar/issues/1601
Author
Owner

@Finkregh commented on GitHub (Mar 26, 2024):

[ol@archivebox data]$ echo $LANG $LC_ALL $LC_CTYPE
en_US.UTF-8
[ol@archivebox data]$ DEBUG=True DEBUG_TOOLBAR=True archivebox manage shell
[i] [2024-03-26 20:59:05] ArchiveBox v0.7.3: archivebox manage shell
    > /home/ol/data

Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.
# ArchiveBox Imports
from archivebox.core.models import Snapshot, ArchiveResult, Tag, User
from archivebox.cli import *
    help
    version
    init
    config
    setup
    add
    remove
    update
    list
    status
    shell
    manage
    server
    oneshot
    schedule

[i] Welcome to the ArchiveBox Shell!
    https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Shell-Usage

    Hint: Example use:
        print(Snapshot.objects.filter(is_archived=True).count())
        Snapshot.objects.get(url="https://example.com").as_json()
        add("https://example.com/some/new/url")

In [1]: import os

In [2]: os.environ
Out[2]:
environ{'DEBUG_TOOLBAR': 'True',
        'DEBUG': 'True',
        'SHELL': '/bin/bash',
        'PWD': '/home/ol/data',
        'LOGNAME': 'ol',
        'XDG_SESSION_TYPE': 'tty',
        'MOTD_SHOWN': 'pam',
        'HOME': '/home/ol',
        'LANG': 'en_US.UTF-8',
        'SSH_CONNECTION': 'xxx 22',
        'XDG_SESSION_CLASS': 'user',
        'TERM': 'xterm-256color',
        'USER': 'ol',
        'SHLVL': '1',
        'XDG_SESSION_ID': '7',
        'XDG_RUNTIME_DIR': '/run/user/1000',
        'SSH_CLIENT': 'x 56510 22',
        'DEBUGINFOD_URLS': 'https://debuginfod.archlinux.org ',
        'PATH': '/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/ol/.local/bin:/home/ol/.local/bin',
        'DBUS_SESSION_BUS_ADDRESS': 'unix:path=/run/user/1000/bus',
        'MAIL': '/var/spool/mail/ol',
        'SSH_TTY': '/dev/pts/3',
        'OLDPWD': '/home/ol',
        '_': '/usr/local/bin/archivebox',
        'TZ': 'UTC',
        'PYTHONSTARTUP': '/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/welcome_message.py',
        'OUTPUT_DIR': '/home/ol/data',
        'DJANGO_SETTINGS_MODULE': 'core.settings'}
<!-- gh-comment-id:2021465044 --> @Finkregh commented on GitHub (Mar 26, 2024): ```log [ol@archivebox data]$ echo $LANG $LC_ALL $LC_CTYPE en_US.UTF-8 [ol@archivebox data]$ DEBUG=True DEBUG_TOOLBAR=True archivebox manage shell [i] [2024-03-26 20:59:05] ArchiveBox v0.7.3: archivebox manage shell > /home/ol/data Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801] Type 'copyright', 'credits' or 'license' for more information IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help. # ArchiveBox Imports from archivebox.core.models import Snapshot, ArchiveResult, Tag, User from archivebox.cli import * help version init config setup add remove update list status shell manage server oneshot schedule [i] Welcome to the ArchiveBox Shell! https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Shell-Usage Hint: Example use: print(Snapshot.objects.filter(is_archived=True).count()) Snapshot.objects.get(url="https://example.com").as_json() add("https://example.com/some/new/url") In [1]: import os In [2]: os.environ Out[2]: environ{'DEBUG_TOOLBAR': 'True', 'DEBUG': 'True', 'SHELL': '/bin/bash', 'PWD': '/home/ol/data', 'LOGNAME': 'ol', 'XDG_SESSION_TYPE': 'tty', 'MOTD_SHOWN': 'pam', 'HOME': '/home/ol', 'LANG': 'en_US.UTF-8', 'SSH_CONNECTION': 'xxx 22', 'XDG_SESSION_CLASS': 'user', 'TERM': 'xterm-256color', 'USER': 'ol', 'SHLVL': '1', 'XDG_SESSION_ID': '7', 'XDG_RUNTIME_DIR': '/run/user/1000', 'SSH_CLIENT': 'x 56510 22', 'DEBUGINFOD_URLS': 'https://debuginfod.archlinux.org ', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/ol/.local/bin:/home/ol/.local/bin', 'DBUS_SESSION_BUS_ADDRESS': 'unix:path=/run/user/1000/bus', 'MAIL': '/var/spool/mail/ol', 'SSH_TTY': '/dev/pts/3', 'OLDPWD': '/home/ol', '_': '/usr/local/bin/archivebox', 'TZ': 'UTC', 'PYTHONSTARTUP': '/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/core/welcome_message.py', 'OUTPUT_DIR': '/home/ol/data', 'DJANGO_SETTINGS_MODULE': 'core.settings'} ```
Author
Owner

@Finkregh commented on GitHub (Mar 26, 2024):

FYI the debug toolbar:

[ol@archivebox data]$ DJANGO_SETTINGS_MODULE=archivebox.core.settings DEBUG=True DEBUG_TOOLBAR=True archivebox server --nothreading '[::]:8080'
[i] [2024-03-26 21:00:54] ArchiveBox v0.7.3: archivebox server --nothreading [::]:8080
    > /home/ol/data

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/cli/__init__.py", line 74, in run_subcommand
    setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending)
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/config.py", line 1420, in setup_django
    with open(settings.ERROR_LOG, "a", encoding='utf-8') as f:
              ^^^^^^^^^^^^^^^^^^
  File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/conf/__init__.py", line 83, in __getattr__
    val = getattr(self._wrapped, name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Settings' object has no attribute 'ERROR_LOG'

with only debug_toolbar.panels.request.RequestPanel in the DEBUG_TOOLBAR_PANELS

<!-- gh-comment-id:2021467314 --> @Finkregh commented on GitHub (Mar 26, 2024): FYI the debug toolbar: ```log [ol@archivebox data]$ DJANGO_SETTINGS_MODULE=archivebox.core.settings DEBUG=True DEBUG_TOOLBAR=True archivebox server --nothreading '[::]:8080' [i] [2024-03-26 21:00:54] ArchiveBox v0.7.3: archivebox server --nothreading [::]:8080 > /home/ol/data Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/cli/__init__.py", line 74, in run_subcommand setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending) File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/archivebox/config.py", line 1420, in setup_django with open(settings.ERROR_LOG, "a", encoding='utf-8') as f: ^^^^^^^^^^^^^^^^^^ File "/home/ol/.local/pipx/venvs/archivebox/lib/python3.11/site-packages/django/conf/__init__.py", line 83, in __getattr__ val = getattr(self._wrapped, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'Settings' object has no attribute 'ERROR_LOG' ``` with only `debug_toolbar.panels.request.RequestPanel` in the `DEBUG_TOOLBAR_PANELS`
Author
Owner

@Finkregh commented on GitHub (Mar 26, 2024):

$ LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 archivebox server --nothreading '[::]:8080'

leads to the same issue as before

<!-- gh-comment-id:2021469673 --> @Finkregh commented on GitHub (Mar 26, 2024): `$ LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 archivebox server --nothreading '[::]:8080'` leads to the same issue as before
Author
Owner

@Finkregh commented on GitHub (Mar 26, 2024):

I pulled the whole debug block in the settings.py to the bottom of the file and added ERROR_LOG="/tmp/err.log" now the server starts and throws the same 500 error as before w/o any debug toolbar :D

<!-- gh-comment-id:2021479317 --> @Finkregh commented on GitHub (Mar 26, 2024): I pulled the whole debug block in the settings.py to the bottom of the file and added `ERROR_LOG="/tmp/err.log"` now the server starts and throws the same 500 error as before w/o any debug toolbar :D
Author
Owner

@Finkregh commented on GitHub (Apr 5, 2024):

Anything else i could try?

If i´d try something like move a directory in data/ away, test and retry after moving the next:

toch still_broken
for dir in data/*
	if [ -e still_broken ] ; then
		move $dir /tmp/
		start server
		curl --fail  && rm still_broken
		kill server 
	fi
done

Would that work? Should I do something additionally in that loop?

<!-- gh-comment-id:2040544926 --> @Finkregh commented on GitHub (Apr 5, 2024): Anything else i could try? If i´d try something like move a directory in data/ away, test and retry after moving the next: ``` toch still_broken for dir in data/* if [ -e still_broken ] ; then move $dir /tmp/ start server curl --fail && rm still_broken kill server fi done ``` Would that work? Should I do something additionally in that loop?
Author
Owner

@Finkregh commented on GitHub (May 6, 2024):

I'm now running this after moving all directories from archive to broken:

set -euo pipefail
for dir in ../broken/* ; do
        echo $dir
        mv "$dir" . #../_kaputt_/
        pushd ..
        archivebox server --nothreading '[::]:8080' & _pid=$!
        popd
        _exit=0
        sleep 2
        curl -L --fail -o /dev/null -s http://127.0.0.0:8080/public/ || _exit=$?
        if [[ $_exit -eq 7  ]]; then sleep 1 ;
                curl -L --fail -o /dev/null -s http://127.0.0.0:8080/public/ || _exit=$?
        fi
        if [[ $_exit -ne 0 ]]; then echo "$dir broken" ; exit 1 ; fi
        kill $_pid
done

This is one folder I identified:

    "canonical": {
        "archive_org_path": "https://web.archive.org/web/standards.webmasterpro.de/index.html?article=zentriertes+Layout%2C+100%25+H%F6he",
        "dom_path": "output.html",
        "favicon_path": "favicon.ico",
        "git_path": "git/",
        "google_favicon_path": "https://www.google.com/s2/favicons?domain=standards.webmasterpro.de",
        "headers_path": "headers.json",
        "htmltotext_path": "htmltotext.txt",
        "index_path": "index.html",
        "media_path": "media/",
        "mercury_path": "mercury/content.html",
        "pdf_path": "output.pdf",
        "readability_path": "readability/content.html",
        "screenshot_path": "screenshot.png",
        "singlefile_path": "singlefile.html",
        "warc_path": "warc/",
        "wget_path": "standards.webmasterpro.de/index.html@article=zentriertes+Layout,+100%+H\udcf6he.html"
    },

So a grep for literally udcf6 would have shown the issue from the beginning...

I'll leave the script running and update again if I come across another issue.

<!-- gh-comment-id:2096811656 --> @Finkregh commented on GitHub (May 6, 2024): I'm now running this after moving all directories from `archive` to `broken`: ```shell set -euo pipefail for dir in ../broken/* ; do echo $dir mv "$dir" . #../_kaputt_/ pushd .. archivebox server --nothreading '[::]:8080' & _pid=$! popd _exit=0 sleep 2 curl -L --fail -o /dev/null -s http://127.0.0.0:8080/public/ || _exit=$? if [[ $_exit -eq 7 ]]; then sleep 1 ; curl -L --fail -o /dev/null -s http://127.0.0.0:8080/public/ || _exit=$? fi if [[ $_exit -ne 0 ]]; then echo "$dir broken" ; exit 1 ; fi kill $_pid done ``` This is one folder I identified: ```json "canonical": { "archive_org_path": "https://web.archive.org/web/standards.webmasterpro.de/index.html?article=zentriertes+Layout%2C+100%25+H%F6he", "dom_path": "output.html", "favicon_path": "favicon.ico", "git_path": "git/", "google_favicon_path": "https://www.google.com/s2/favicons?domain=standards.webmasterpro.de", "headers_path": "headers.json", "htmltotext_path": "htmltotext.txt", "index_path": "index.html", "media_path": "media/", "mercury_path": "mercury/content.html", "pdf_path": "output.pdf", "readability_path": "readability/content.html", "screenshot_path": "screenshot.png", "singlefile_path": "singlefile.html", "warc_path": "warc/", "wget_path": "standards.webmasterpro.de/index.html@article=zentriertes+Layout,+100%+H\udcf6he.html" }, ``` So a grep for literally `udcf6` would have shown the issue from the beginning... I'll leave the script running and update again if I come across another issue.
Author
Owner

@pirate commented on GitHub (May 7, 2024):

Argh, it was wget path detection all along! That part of the codebase causes so many nasty surprises, see https://github.com/ArchiveBox/ArchiveBox/issues/549

Working around and reverse-engineering wget's absurdly complicated mapping of URLs to filepaths is consistently one of the most troublesome, labor-intensive parts of running this entire project.

I think I'll abandon trying to support unicode in filepaths entirely and just change wget to use --restrict-file-names=ascii --content-disposition and also stop trying to auto-detect wget's output location like this, it's caused so many hard-to-debug headaches like this one. (it turns out that counterintuitively windows is more restrictive than ascii, and that I already tried this in the past and reverted it)

<!-- gh-comment-id:2097937389 --> @pirate commented on GitHub (May 7, 2024): Argh, it was wget path detection all along! That part of the codebase causes so many nasty surprises, see https://github.com/ArchiveBox/ArchiveBox/issues/549 Working around and reverse-engineering wget's absurdly complicated mapping of URLs to filepaths is consistently one of the most troublesome, labor-intensive parts of running this entire project. ~~I think I'll abandon trying to support unicode in filepaths entirely and just change wget to use `--restrict-file-names=ascii --content-disposition` and also stop trying to auto-detect wget's output location like this, it's caused so many hard-to-debug headaches like this one.~~ (it turns out that counterintuitively `windows` is *more* restrictive than `ascii`, and that I [already tried this](https://github.com/ArchiveBox/ArchiveBox/issues/210#issuecomment-481943709) in the past and reverted it)
Author
Owner

@pirate commented on GitHub (May 7, 2024):

Should be tentatively fixed in https://github.com/ArchiveBox/ArchiveBox/pull/1424

I added workaround logic in wget_output_path() to fallback to the parent dir if the html path contains unprintable unicode.

<!-- gh-comment-id:2098251576 --> @pirate commented on GitHub (May 7, 2024): Should be tentatively fixed in https://github.com/ArchiveBox/ArchiveBox/pull/1424 I added workaround logic in `wget_output_path()` to fallback to the parent dir if the html path contains unprintable unicode.
Author
Owner

@clb92 commented on GitHub (Jul 29, 2024):

Something very similar just happened to me again:

[+] Starting ArchiveBox webserver...
    > Logging errors to ./logs/errors.log
Performing system checks...

System check identified no issues (0 silenced).
July 29, 2024 - 13:49:51
Django version 3.1.14, using settings 'core.settings'
Starting development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.
Internal Server Error: /admin/core/snapshot/
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response
    response = response.render()
               ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 105, in render
    self.content = self.rendered_content
    ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 134, in content
    HttpResponse.content.fset(self, value)
  File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 328, in content
    content = self.make_bytes(value)
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
    return bytes(value.encode(self.charset))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 34032: surrogates not allowed
"GET /admin/core/snapshot/ HTTP/1.1" 500 145

After removing snapshots via the CLI for half an hour, I tracked it down to a Google search URL.

<!-- gh-comment-id:2256015460 --> @clb92 commented on GitHub (Jul 29, 2024): Something very similar just happened to me again: ``` [+] Starting ArchiveBox webserver... > Logging errors to ./logs/errors.log Performing system checks... System check identified no issues (0 silenced). July 29, 2024 - 13:49:51 Django version 3.1.14, using settings 'core.settings' Starting development server at http://0.0.0.0:8000/ Quit the server with CONTROL-C. Internal Server Error: /admin/core/snapshot/ Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response response = response.render() ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 105, in render self.content = self.rendered_content ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 134, in content HttpResponse.content.fset(self, value) File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 328, in content content = self.make_bytes(value) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes return bytes(value.encode(self.charset)) ^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 34032: surrogates not allowed "GET /admin/core/snapshot/ HTTP/1.1" 500 145 ``` After removing snapshots via the CLI for half an hour, I tracked it down to a Google search URL.
Author
Owner

@dot-mike commented on GitHub (Aug 3, 2025):

Hit this issue as well. Apparerntly archivebox does not like URLS with special characters in them.

2025-08-03T01:07:00.750877769Z "GET /admin/core/snapshot/ HTTP/1.1" 302 0
2025-08-03T01:07:01.143568761Z "GET /admin/login/?next=/admin/core/snapshot/ HTTP/1.1" 200 11747
2025-08-03T01:07:09.602288773Z "POST /admin/login/?next=/admin/core/snapshot/ HTTP/1.1" 302 0
2025-08-03T01:07:11.301881283Z Internal Server Error: /admin/core/snapshot/
2025-08-03T01:07:11.302329793Z Traceback (most recent call last):
2025-08-03T01:07:11.302386468Z   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner
2025-08-03T01:07:11.302454051Z     response = get_response(request)
2025-08-03T01:07:11.302499345Z                ^^^^^^^^^^^^^^^^^^^^^
2025-08-03T01:07:11.302548846Z   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response
2025-08-03T01:07:11.302604044Z     response = response.render()
2025-08-03T01:07:11.302646619Z                ^^^^^^^^^^^^^^^^^
2025-08-03T01:07:11.302684901Z   File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 105, in render
2025-08-03T01:07:11.302734280Z     self.content = self.rendered_content
2025-08-03T01:07:11.302824999Z     ^^^^^^^^^^^^
2025-08-03T01:07:11.302865787Z   File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 134, in content
2025-08-03T01:07:11.302915235Z     HttpResponse.content.fset(self, value)
2025-08-03T01:07:11.302960831Z   File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 328, in content
2025-08-03T01:07:11.303011213Z     content = self.make_bytes(value)
2025-08-03T01:07:11.303057142Z               ^^^^^^^^^^^^^^^^^^^^^^
2025-08-03T01:07:11.303095523Z   File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes
2025-08-03T01:07:11.303143589Z     return bytes(value.encode(self.charset))
2025-08-03T01:07:11.303188943Z                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-08-03T01:07:11.303227599Z UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 11796: surrogates not allowed
2025-08-03T01:07:11.303302511Z "GET /admin/core/snapshot/ HTTP/1.1" 500 145

File logs:

$ user@server:/volume2/docker/webarchive/data/archivebox$ grep -ir udce5 .
./archive/1754163168.625424/index.json:        "dom_path": "www.domain.com/div/faderv\udce5r.txt",
<!-- gh-comment-id:3146893903 --> @dot-mike commented on GitHub (Aug 3, 2025): Hit this issue as well. Apparerntly archivebox does not like URLS with special characters in them. ``` 2025-08-03T01:07:00.750877769Z "GET /admin/core/snapshot/ HTTP/1.1" 302 0 2025-08-03T01:07:01.143568761Z "GET /admin/login/?next=/admin/core/snapshot/ HTTP/1.1" 200 11747 2025-08-03T01:07:09.602288773Z "POST /admin/login/?next=/admin/core/snapshot/ HTTP/1.1" 302 0 2025-08-03T01:07:11.301881283Z Internal Server Error: /admin/core/snapshot/ 2025-08-03T01:07:11.302329793Z Traceback (most recent call last): 2025-08-03T01:07:11.302386468Z File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 47, in inner 2025-08-03T01:07:11.302454051Z response = get_response(request) 2025-08-03T01:07:11.302499345Z ^^^^^^^^^^^^^^^^^^^^^ 2025-08-03T01:07:11.302548846Z File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 204, in _get_response 2025-08-03T01:07:11.302604044Z response = response.render() 2025-08-03T01:07:11.302646619Z ^^^^^^^^^^^^^^^^^ 2025-08-03T01:07:11.302684901Z File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 105, in render 2025-08-03T01:07:11.302734280Z self.content = self.rendered_content 2025-08-03T01:07:11.302824999Z ^^^^^^^^^^^^ 2025-08-03T01:07:11.302865787Z File "/usr/local/lib/python3.11/site-packages/django/template/response.py", line 134, in content 2025-08-03T01:07:11.302915235Z HttpResponse.content.fset(self, value) 2025-08-03T01:07:11.302960831Z File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 328, in content 2025-08-03T01:07:11.303011213Z content = self.make_bytes(value) 2025-08-03T01:07:11.303057142Z ^^^^^^^^^^^^^^^^^^^^^^ 2025-08-03T01:07:11.303095523Z File "/usr/local/lib/python3.11/site-packages/django/http/response.py", line 241, in make_bytes 2025-08-03T01:07:11.303143589Z return bytes(value.encode(self.charset)) 2025-08-03T01:07:11.303188943Z ^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-08-03T01:07:11.303227599Z UnicodeEncodeError: 'utf-8' codec can't encode character '\udce5' in position 11796: surrogates not allowed 2025-08-03T01:07:11.303302511Z "GET /admin/core/snapshot/ HTTP/1.1" 500 145 ``` File logs: ```bash $ user@server:/volume2/docker/webarchive/data/archivebox$ grep -ir udce5 . ./archive/1754163168.625424/index.json: "dom_path": "www.domain.com/div/faderv\udce5r.txt", ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3860
No description provided.