[GH-ISSUE #1674] Feature Request: Set auth / session details when archiving #1000

Closed
opened 2026-03-01 14:47:50 +03:00 by kerem · 1 comment
Owner

Originally created by @aes on GitHub (Apr 11, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1674

Originally assigned to: @pirate on GitHub.

What type of suggestion are you making?

Modification of existing behavior

What is the problem that your feature request solves?

When I'm logged in to a service, I would like to be able to use one of the browser plugins to submit an archiving request, with my logged-in cookies and headers, so that the page I am actually looking at will be archived, rather than the login-page

What is your proposed solution?

  1. Add additional fields to the job model, to record cookies and headers.
  2. Implement passing cookies and headers in extractors, so as to impersonate me and my session.
  3. It would be nice if these fields were scrubbed after the job finishes, but a warning to the effect that they may escape is sufficient.
  4. Update browser extensions etc, to extract this data directly from their environment.

What hacks or alternative solutions have you tried to solve the problem?

Archive using other tool. (Not ideal)

Share the entire output of the archivebox version command for the current verison you are using.

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.20-amd64-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v1.1.54         valid     /app/node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            5 files @       valid     /data
 √  SOURCES_DIR           360 files       valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1780 files      valid     ./archive
 √  CONFIG_FILE           102.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             7.9 MB          valid     ./index.sqlite3

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
  • I'm willing to start a PR to develop this myself
  • I have donated money to go towards fixing this issue

Mini Survey

  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
  • I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
Originally created by @aes on GitHub (Apr 11, 2025). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1674 Originally assigned to: @pirate on GitHub. ### What type of suggestion are you making? Modification of existing behavior ### What is the problem that your feature request solves? When I'm logged in to a service, I would like to be able to use one of the browser plugins to submit an archiving request, _with my logged-in cookies and headers_, so that the page I am actually looking at will be archived, rather than the login-page ### What is your proposed solution? 1. Add additional fields to the job model, to record cookies and headers. 2. Implement passing cookies and headers in extractors, so as to impersonate me and my session. 3. It would be nice if these fields were scrubbed after the job finishes, but a warning to the effect that they may escape is sufficient. 4. Update browser extensions etc, to extract this data directly from their environment. ### What hacks or alternative solutions have you tried to solve the problem? Archive using other tool. (Not ideal) ### Share the entire output of the `archivebox version` command for the current verison you are using. ```shell 0.7.3 ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.12.20-amd64-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=0:0 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox √ CURL_BINARY v8.10.1 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.18.1 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.54 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.5 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 5 files @ valid /data √ SOURCES_DIR 360 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1780 files valid ./archive √ CONFIG_FILE 102.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 7.9 MB valid ./index.sqlite3 ``` ### How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually - [ ] I'm willing to [start a PR](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) to develop this myself - [ ] I have [donated money](https://github.com/ArchiveBox/ArchiveBox/wiki/Donations) to go towards fixing this issue ### Mini Survey - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up - [ ] I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
kerem closed this issue 2026-03-01 14:47:50 +03:00
Author
Owner
<!-- gh-comment-id:2806871105 --> @pirate commented on GitHub (Apr 15, 2025): https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1000
No description provided.