[GH-ISSUE #1713] Bug: ERR_HTTP2_PROTOCOL_ERROR on some websites #2536

Open
opened 2026-03-01 17:59:41 +03:00 by kerem · 4 comments
Owner

Originally created by @jessienab on GitHub (Nov 25, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1713

Originally assigned to: @pirate on GitHub.

Provide a screenshot and describe the bug

With 0.7.3, some websites are giving ERR_HTTP2_PROTOCOL_ERROR as the local saved result, when using any method other than mercury.

I'm not sure if this is related to an older Chrome version, or some other issue...

Steps to reproduce

1. Start Archivebox 0.7.3
2. Archive the following link with all methods:

https://www.cbc.ca/news/politics/canada-agriculture-seeds-rights-1.7610106


3. Only the `mercury` result should produce actual content, everything else will give the HTTP error.

To note, the page loads fine on desktop with the same version of Chrome/newer Chrome, and Firefox as well.

ArchiveBox Version

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.57-1-lts-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            6 files @       valid     /data                                                                       
 √  SOURCES_DIR           2935 files      valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           14511 files     valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             149.2 MB        valid     ./index.sqlite3

How did you install the version of ArchiveBox you are using?

Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc)

What operating system are you running on?

Linux (Ubuntu/Debian/Arch/Alpine/etc.)

What type of drive are you using to store your ArchiveBox data?

  • All of data/ is on a local NVMe drive

Docker Compose Configuration

version: '2.4'   # '3.9' or greater also works

services:
    archivebox:
        image: archivebox/archivebox:0.7.3
#        command: server --quick-init 0.0.0.0:8000
        ports:
            - 127.0.0.1:8000:8000
        environment:
            - ALLOWED_HOSTS=*                   
            - MEDIA_MAX_SIZE=2048m
            - SEARCH_BACKEND_ENGINE=sonic     
            - SEARCH_BACKEND_HOST_NAME=sonic
            - SEARCH_BACKEND_PASSWORD='~'


#        dns:                                  
#            - pihole
        volumes:
            - archivebox/data:/data
        deploy:
          resources:
            limits:
              cpus: "2"
              memory: 2000M

    sonic:
       image: archivebox/sonic:latest
       expose:
           - 1491
       environment:
           - SEARCH_BACKEND_PASSWORD='~'
       volumes:
           - archivebox/sonic.cfg:/etc/sonic.cfg:ro
           - archivebox/sonic:/var/lib/sonic/store

    pihole:
      image: pihole/pihole:latest
      ports:
        - 127.0.0.1:8098:80       # uncomment to access the admin HTTP interface on http://localhost:8098
      environment:
        WEBPASSWORD: ''
      volumes:
        - archivebox/pihole:/etc/pihole
        - archivebox/dnsmasq:/etc/dnsmasq.d

ArchiveBox Configuration

[SERVER_CONFIG]
SECRET_KEY = ~~~
Originally created by @jessienab on GitHub (Nov 25, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1713 Originally assigned to: @pirate on GitHub. ### Provide a screenshot and describe the bug With 0.7.3, some websites are giving `ERR_HTTP2_PROTOCOL_ERROR` as the local saved result, when using any method other than `mercury`. I'm not sure if this is related to an older Chrome version, or some other issue... ### Steps to reproduce ```markdown 1. Start Archivebox 0.7.3 2. Archive the following link with all methods: https://www.cbc.ca/news/politics/canada-agriculture-seeds-rights-1.7610106 3. Only the `mercury` result should produce actual content, everything else will give the HTTP error. To note, the page loads fine on desktop with the same version of Chrome/newer Chrome, and Firefox as well. ``` ### ArchiveBox Version ```shell 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.57-1-lts-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 6 files @ valid /data √ SOURCES_DIR 2935 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 14511 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 149.2 MB valid ./index.sqlite3 ``` ### How did you install the version of ArchiveBox you are using? Docker (or Podman/LXC/K8s/TrueNAS/Proxmox/etc) ### What operating system are you running on? Linux (Ubuntu/Debian/Arch/Alpine/etc.) ### What type of drive are you using to store your ArchiveBox data? - [x] All of `data/` is on a local NVMe drive ### Docker Compose Configuration ```shell version: '2.4' # '3.9' or greater also works services: archivebox: image: archivebox/archivebox:0.7.3 # command: server --quick-init 0.0.0.0:8000 ports: - 127.0.0.1:8000:8000 environment: - ALLOWED_HOSTS=* - MEDIA_MAX_SIZE=2048m - SEARCH_BACKEND_ENGINE=sonic - SEARCH_BACKEND_HOST_NAME=sonic - SEARCH_BACKEND_PASSWORD='~' # dns: # - pihole volumes: - archivebox/data:/data deploy: resources: limits: cpus: "2" memory: 2000M sonic: image: archivebox/sonic:latest expose: - 1491 environment: - SEARCH_BACKEND_PASSWORD='~' volumes: - archivebox/sonic.cfg:/etc/sonic.cfg:ro - archivebox/sonic:/var/lib/sonic/store pihole: image: pihole/pihole:latest ports: - 127.0.0.1:8098:80 # uncomment to access the admin HTTP interface on http://localhost:8098 environment: WEBPASSWORD: '' volumes: - archivebox/pihole:/etc/pihole - archivebox/dnsmasq:/etc/dnsmasq.d ``` ### ArchiveBox Configuration ```shell [SERVER_CONFIG] SECRET_KEY = ~~~ ```
Author
Owner

@pirate commented on GitHub (Feb 8, 2026):

Did you ever figure out the cause / did this keep happening a lot?

I have never seen this error myself but I keep an eye out for it.

<!-- gh-comment-id:3866135404 --> @pirate commented on GitHub (Feb 8, 2026): Did you ever figure out the cause / did this keep happening a lot? I have never seen this error myself but I keep an eye out for it.
Author
Owner

@jessienab commented on GitHub (Feb 21, 2026):

Did you ever figure out the cause / did this keep happening a lot?

I have never seen this error myself but I keep an eye out for it.

Hey, yes I still run into this on rare occasions... I really can't explain why it happens, but it's not all webpages. Sadly, I don't have any NEW example sites to share at the moment, but if it pops up again I'll add a new comment here with the link (I sort of forgot about this issue 🫣 ).

I can still reproduce this with the site referenced in the OP:

https://www.cbc.ca/news/politics/canada-agriculture-seeds-rights-1.7610106

Image
<!-- gh-comment-id:3938910831 --> @jessienab commented on GitHub (Feb 21, 2026): > Did you ever figure out the cause / did this keep happening a lot? > > I have never seen this error myself but I keep an eye out for it. Hey, yes I still run into this on rare occasions... I really can't explain why it happens, but it's not all webpages. Sadly, I don't have any NEW example sites to share at the moment, but if it pops up again I'll add a new comment here with the link (I sort of forgot about this issue 🫣 ). I can still reproduce this with the site referenced in the OP: > https://www.cbc.ca/news/politics/canada-agriculture-seeds-rights-1.7610106 <img width="1440" height="801" alt="Image" src="https://github.com/user-attachments/assets/97a1741d-07d4-4066-95a4-999a34381dd9" />
Author
Owner

@jessienab commented on GitHub (Feb 24, 2026):

Small bump, another site I get this with:

https://www.costco.ca/dove-sensitive-skin-soap-bar-16-x-106-g.product.100650937.html

and in general, all Costco.ca links at the moment give me HTTP2 errors... I don't even have mercury output with costco.ca links.

https://www.costcobusinesscentre.ca/ however works fine.

<!-- gh-comment-id:3954548164 --> @jessienab commented on GitHub (Feb 24, 2026): Small bump, another site I get this with: https://www.costco.ca/dove-sensitive-skin-soap-bar-16-x-106-g.product.100650937.html and in general, all Costco.ca links at the moment give me HTTP2 errors... I don't even have `mercury` output with costco.ca links. https://www.costcobusinesscentre.ca/ however works fine.
Author
Owner

@pirate commented on GitHub (Feb 24, 2026):

It's likely a stale chrome version in the older 0.7.3 docker image if I had to guess.

If you're able to run it locally / on bare metal without docker, you can try the latest dev branch which has a ton of other improvements. (dont upgrade your main collection though, it's not stable yet)

<!-- gh-comment-id:3955335639 --> @pirate commented on GitHub (Feb 24, 2026): It's likely a stale chrome version in the older 0.7.3 docker image if I had to guess. If you're able to run it locally / on bare metal without docker, you can try the latest `dev` branch which has a ton of other improvements. (dont upgrade your main collection though, it's not stable yet)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2536
No description provided.