[GH-ISSUE #1292] Exception ConnectionResetError: [Errno 104] Connection reset is thrown when user navigates away while page is still loading #3816

Closed
opened 2026-03-15 00:33:28 +03:00 by kerem · 4 comments
Owner

Originally created by @mamema on GitHub (Dec 18, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1292

Describe the bug

docker log:
"GET /admin/core/archiveresult/?q=erika HTTP/1.1" 200 16014


Exception occurred during processing of request from ('192.168.0.2', 54854)

Traceback (most recent call last):

File "/usr/local/lib/python3.11/socketserver.py", line 691, in process_request_thread

self.finish_request(request, client_address)

File "/usr/local/lib/python3.11/socketserver.py", line 361, in finish_request

self.RequestHandlerClass(request, client_address, self)

File "/usr/local/lib/python3.11/socketserver.py", line 755, in init

self.handle()

File "/usr/local/lib/python3.11/site-packages/django/core/servers/basehttp.py", line 174, in handle

self.handle_one_request()

File "/usr/local/lib/python3.11/site-packages/django/core/servers/basehttp.py", line 182, in handle_one_request

self.raw_requestline = self.rfile.readline(65537)

                       ^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.11/socket.py", line 706, in readinto

return self._sock.recv_into(b)

       ^^^^^^^^^^^^^^^^^^^^^^^

ConnectionResetError: [Errno 104] Connection rese

Steps to reproduce

using the firefox or chrome extension and configure with blocklist, so EVERY web page visited should be archived
but nothing gets archived

ArchiveBox version

0.7.1
ArchiveBox v0.7.1+editable BUILD_TIME=2023-12-18 06:57:51 1702882671
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.16.0-0.bpo.4-amd64-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.7         valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.1          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.4.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v21.4.0         valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.18         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.9          valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.11.16     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v120.0.6099.28  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         4 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            5 files @       valid     /data                                                                       
 √  SOURCES_DIR           1 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           2 files         valid     ./archive                                                                   
 √  CONFIG_FILE           150.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             648.0 KB        valid     ./index.sqlite3                                                             
Originally created by @mamema on GitHub (Dec 18, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1292 #### Describe the bug docker log: "GET /admin/core/archiveresult/?q=erika HTTP/1.1" 200 16014 ---------------------------------------- Exception occurred during processing of request from ('192.168.0.2', 54854) Traceback (most recent call last): File "/usr/local/lib/python3.11/socketserver.py", line 691, in process_request_thread self.finish_request(request, client_address) File "/usr/local/lib/python3.11/socketserver.py", line 361, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/local/lib/python3.11/socketserver.py", line 755, in __init__ self.handle() File "/usr/local/lib/python3.11/site-packages/django/core/servers/basehttp.py", line 174, in handle self.handle_one_request() File "/usr/local/lib/python3.11/site-packages/django/core/servers/basehttp.py", line 182, in handle_one_request self.raw_requestline = self.rfile.readline(65537) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/socket.py", line 706, in readinto return self._sock.recv_into(b) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection rese #### Steps to reproduce using the firefox or chrome extension and configure with blocklist, so EVERY web page visited should be archived but nothing gets archived #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs 0.7.1 ArchiveBox v0.7.1+editable BUILD_TIME=2023-12-18 06:57:51 1702882671 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.16.0-0.bpo.4-amd64-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.7 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.1 valid /usr/local/bin/archivebox √ CURL_BINARY v8.4.0 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v21.4.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.18 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.9 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.11.16 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v120.0.6099.28 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /app/archivebox √ TEMPLATES_DIR 4 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 5 files @ valid /data √ SOURCES_DIR 1 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 2 files valid ./archive √ CONFIG_FILE 150.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 648.0 KB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem 2026-03-15 00:33:28 +03:00
Author
Owner

@pirate commented on GitHub (Dec 19, 2023):

The exception you posted is a common harmless exception that occurs when someone navigates away from a page while it's still loading, it doesn't indicate any failure of the archiving process. (in this case, it's showing a user searching for erika in the admin UI Logs page /admin/core/archiveresult/?q=erika)

Have you verified that the extension is not submitting URLs independently of this error message?

If it's indeed broken do you mind posting more of your ./data/logs/error.log file, and or the output of docker compose up around the time the extension is trying to submit URLs.

<!-- gh-comment-id:1861915400 --> @pirate commented on GitHub (Dec 19, 2023): The exception you posted is a common harmless exception that occurs when someone navigates away from a page while it's still loading, it doesn't indicate any failure of the archiving process. (in this case, it's showing a user searching for `erika` in the admin UI Logs page `/admin/core/archiveresult/?q=erika`) Have you verified that the extension is not submitting URLs independently of this error message? If it's indeed broken do you mind posting more of your `./data/logs/error.log` file, and or the output of `docker compose up` around the time the extension is trying to submit URLs.
Author
Owner

@mamema commented on GitHub (Dec 19, 2023):

if i'm using the dev version i get it working.
But it is very sensitive with wrong regex filters. Easy to make mistakes there....
The error log is not reporting anything even when debugging is enabled

<!-- gh-comment-id:1862470476 --> @mamema commented on GitHub (Dec 19, 2023): if i'm using the dev version i get it working. But it is very sensitive with wrong regex filters. Easy to make mistakes there.... The error log is not reporting anything even when debugging is enabled
Author
Owner

@pirate commented on GitHub (Dec 19, 2023):

Yeah, you can test your regex filters separately before adding them like so:

>>> import re
>>> URL_DENYLIST = r'^http(s)?:\/\/(.+\.)?(youtube\.com)|(amazon\.com)\/.*$'  # replace this with your regex to test
>>> URL_DENYLIST_PTN = re.compile(URL_DENYLIST, re.IGNORECASE | re.UNICODE | re.MULTILINE)

>>> bool(URL_DENYLIST_PTN.search('https://test.youtube.com/example.php?abc=123'))  # replace this with the URL to test
True   # this URL would not be archived because it matches the exclusion pattern

From the docs: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#url_denylist

<!-- gh-comment-id:1863394939 --> @pirate commented on GitHub (Dec 19, 2023): Yeah, you can test your regex filters separately before adding them like so: ```python >>> import re >>> URL_DENYLIST = r'^http(s)?:\/\/(.+\.)?(youtube\.com)|(amazon\.com)\/.*$' # replace this with your regex to test >>> URL_DENYLIST_PTN = re.compile(URL_DENYLIST, re.IGNORECASE | re.UNICODE | re.MULTILINE) >>> bool(URL_DENYLIST_PTN.search('https://test.youtube.com/example.php?abc=123')) # replace this with the URL to test True # this URL would not be archived because it matches the exclusion pattern ``` From the docs: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#url_denylist
Author
Owner

@mamema commented on GitHub (Dec 20, 2023):

i'm using the regex101 page and select python as the regex mode. It's a gui and for the lazy ones like me

<!-- gh-comment-id:1864842499 --> @mamema commented on GitHub (Dec 20, 2023): i'm using the regex101 page and select python as the regex mode. It's a gui and for the lazy ones like me
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3816
No description provided.