[GH-ISSUE #1028] Is CSRF cookie blocking CURL and archivebox-exporter plugin access? #645

Closed
opened 2026-03-01 14:45:15 +03:00 by kerem · 2 comments
Owner

Originally created by @erwin on GitHub (Sep 17, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1028

Thanks for stopping by and taking the time to read this!

My goal is plugin and remote CLI access. Archivebox itself works great!

I'm running ArchiveBox in docker.

archivebox version
archivebox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.15.0-47-generic-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                    
 √  DJANGO_BINARY         v3.1.3          valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     /data                                                                       
 √  SOURCES_DIR           10 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           246 files       valid     ./archive                                                                   
 √  CONFIG_FILE           104.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             2.6 MB          valid     ./index.sqlite3                                                             

I've tried setting:

archivebox config --set PUBLIC_ADD_VIEW=True

But the archivebox-exporter plugin is unable to add any pages to ArchiveBox, and when I try to add pages via curl from the command line, I get:

curl -X POST \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data-urlencode "url=https://archivebox.io/" \
  --data-urlencode "DEBUG=True" \
  https://my-local-https-host/add

I get back the following page content:

<div id="summary">
  <h1>Forbidden <span>(403)</span></h1>
  <p>CSRF verification failed. Request aborted.</p>


  <p>You are seeing this message because this site requires a CSRF cookie when submitting forms. This cookie is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p>
  <p>If you have configured your browser to disable cookies, please re-enable them, at least for this site, or for “same-origin” requests.</p>

</div>

<div id="explanation">
  <p><small>More information is available with DEBUG=True.</small></p>
</div>

I thought that the /add page was exempt from CSRF cookies according to:
https://github.com/ArchiveBox/ArchiveBox/pull/777

Note that I tried to set DEBUG=true in the HTTP request, but that doesn't change anything. I presume it's part of the django / archivebox config, but no idea where we're actually supposed to set that.

Any other thoughts about how to reconfigure archivebox so that I can use the CLI remotely via CURL and so that the archive-box exporter browser plugin will connect and add pages?

Originally created by @erwin on GitHub (Sep 17, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1028 Thanks for stopping by and taking the time to read this! My goal is plugin and remote CLI access. Archivebox itself works great! I'm running ArchiveBox in docker. <details> <summary>archivebox version</summary> ``` archivebox version ArchiveBox v0.6.2 Cpython Linux Linux-5.15.0-47-generic-x86_64-with-glibc2.28 x86_64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.3 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid /data √ SOURCES_DIR 10 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 246 files valid ./archive √ CONFIG_FILE 104.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 2.6 MB valid ./index.sqlite3 ``` </details> I've tried setting: ``` archivebox config --set PUBLIC_ADD_VIEW=True ``` But the archivebox-exporter plugin is unable to add any pages to ArchiveBox, and when I try to add pages via `curl` from the command line, I get: ``` curl -X POST \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode "url=https://archivebox.io/" \ --data-urlencode "DEBUG=True" \ https://my-local-https-host/add ``` I get back the following page content: ``` <div id="summary"> <h1>Forbidden <span>(403)</span></h1> <p>CSRF verification failed. Request aborted.</p> <p>You are seeing this message because this site requires a CSRF cookie when submitting forms. This cookie is required for security reasons, to ensure that your browser is not being hijacked by third parties.</p> <p>If you have configured your browser to disable cookies, please re-enable them, at least for this site, or for “same-origin” requests.</p> </div> <div id="explanation"> <p><small>More information is available with DEBUG=True.</small></p> </div> ``` I thought that the `/add` page was exempt from CSRF cookies according to: https://github.com/ArchiveBox/ArchiveBox/pull/777 Note that I tried to set `DEBUG=true` in the HTTP request, but that doesn't change anything. I presume it's part of the django / archivebox config, but no idea where we're actually supposed to set that. Any other thoughts about how to reconfigure archivebox so that I can use the CLI remotely via CURL and so that the archive-box exporter browser plugin will connect and add pages?
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

I'm sorry I didn't respond to this earlier! I haven't encountered this issue myself with the extension, are you still experiencing it with the latest archivebox/archivebox:dev image?

I suspect you may need to have cookies submitted with the archivebox /add POST requests to save URLs, bare CURL requests may not work because of default Django protections. The archivebox-exporter extension and your browser should handle this normally though. The extension is working in my browser and a few other test machines, so I'm inclined to believe it's an edge case.

Comment back if you're still having issues though, I'm happy to re-open this ticket and investigate further.

<!-- gh-comment-id:1589113566 --> @pirate commented on GitHub (Jun 13, 2023): I'm sorry I didn't respond to this earlier! I haven't encountered this issue myself with the extension, are you still experiencing it with the latest `archivebox/archivebox:dev` image? I suspect you may need to have cookies submitted with the archivebox `/add` POST requests to save URLs, bare CURL requests may not work because of default Django protections. The `archivebox-exporter` extension and your browser should handle this normally though. The extension is working in my browser and a few other test machines, so I'm inclined to believe it's an edge case. Comment back if you're still having issues though, I'm happy to re-open this ticket and investigate further.
Author
Owner

@erwin commented on GitHub (Jun 15, 2023):

FYI, by changing to :dev it's working fine for me now.

        image: ${DOCKER_IMAGE:-archivebox/archivebox:dev}

Note, do not use "/add" on the end of your configured hostname... That will be sent automatically by the plugin

image

<!-- gh-comment-id:1592418651 --> @erwin commented on GitHub (Jun 15, 2023): FYI, by changing to `:dev` it's working fine for me now. ``` image: ${DOCKER_IMAGE:-archivebox/archivebox:dev} ``` Note, do not use "/add" on the end of your configured hostname... That will be sent automatically by the plugin ![image](https://github.com/ArchiveBox/ArchiveBox/assets/11534217/b109361c-4862-4117-972b-5e6b2037bac2)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#645
No description provided.