mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1638] Add option to share cookies in /api/v1/cli/[add | update | schedule] calls #2491
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2491
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @datoslabs on GitHub (Jan 20, 2025).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1638
Originally assigned to: @pirate on GitHub.
What type of suggestion are you making?
Modification of existing behavior
What is the problem that your feature request solves?
Hi,
Would it be possible to add optional
"cookies":["string"]to the request body JSON of/api/v1/cli/[add | update | schedule]api calls and allow applicable extractors to load/use the provided cookies? The cookies are one-time-use only and should not be cached after extraction. This will allow ArchiveBox to extract pages using "dynamic" cookies provided by the requestor and the chrome extension to extract the current/active tab's cookies when requesting to archive the current page.Please note, while browser enabled AI/LLM agents like https://github.com/browser-use/browser-use and https://github.com/unclecode/crawl4ai can automatically recognize cookie consent or newsletter signup popup/overlays and dismiss them, I feel that sharing cookies and allowing ArchiveBox extractors to dynamically load cookies from different users can support more use cases when ArchiveBox is shared among groups of users.
What is your proposed solution?
I have tried to setup cookies on my ArchiveBox docker container over VNC as documented in the wiki; however, keeping cookies up-to-date on new web sites to bypass consent or newsletter subscription popups are laborious. If we can add optional
"cookies":["string"]to the request body JSON of/api/v1/cli/[add | update | schedule]api calls, requestors (including ArchiveBox chrome extension), can have the option to submit their current session cookies as part of the request for one-time-use by the applicable extractors.What hacks or alternative solutions have you tried to solve the problem?
Besides using noVNC to update ArchiveBox's chrome user profile, recently I began to experiment with using/modifying https://github.com/browser-use/browser-use and https://github.com/unclecode/crawl4ai using browser enabled AI/LLM agents to perform extraction outside of ArchiveBox for select pages.
Share the entire output of the
archivebox versioncommand for the current verison you are using.How badly do you want this new feature?
Mini Survey