mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1445] Chrome Browser Profile / Cookies not applying to SingleFile in v0.7.2? #2373
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2373
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @JitteryDoodle on GitHub (Jun 3, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1445
Originally assigned to: @pirate on GitHub.
Hello,
I've been able to get cookies working for the Chrome > PDF, Screenshot, Headers, Chrome > HTML, and Readability views, but SingleFile is not signed in. This applies to multiple websites. What could the issue be and how can I troubleshoot this? I see nothing incorrect in the logs, even the logs related to singlepage display my user_data_dir path.
Here's my version info (confusingly on part, I named the chromium mount "chrome_profile", but this is actually the full chromium folder with the Default folder inside):
Willing to share more if needed.
Thanks!
@pirate commented on GitHub (Jun 4, 2024):
So I just double checked, in 0.7.x chrome profile use with singlefile should work automatically... the code is there and it's working for me on some sites. Unfortunately many larger (big tech) sites detect headless browsers to block bots and log you out automatically.
Can you share some of the domains where you're getting logged out and I can try to investigate further to see if it's bot-blocking or a bug in ArchiveBox/singlefile?
Can you also try running single-file with your chrome profile manually outside Docker to remove ArchiveBox and environment issues as potential factors:
This will also open the browser window (instead of running it headless) ^ so you can see if there is any obvious reason you're getting logged out visually.
You can also try using cookies.txt (which it looks like you already have) and tweaking your
CHROME_USER_AGENTto see if that helps:To use cookies with singlefile on v0.7.x you'd do:
On v0.8.0+ (coming soon) it'll work automatically without needing that ^ if you have
COOKIES_FILEsetup.More info:
@iluvatyr commented on GitHub (Jun 5, 2024):
I tested with my docker archivebox and created the chrome profile within docker using the novnc container.
Whenever I start the chromium-browser and check the login into some page via novnc, it is actually working and Im logged into my profile. When I dont interactively do it and do the archive command for singlefile page, it is as if there was no chrome profile...
Environment variables are all set.
EDIT: So does anyone get how to get it to work?
@JitteryDoodle commented on GitHub (Jun 20, 2024):
My experience mirrors @iluvatyr - everything seems to be working, except for the archivebox singlefile.
@JitteryDoodle commented on GitHub (Jun 20, 2024):
I also just tried with the 8.0.0 dev version from March, and the same thing happens - SingleFile isn't signed in, but for everything else it appears to have my cookies.
@iluvatyr commented on GitHub (Jul 4, 2024):
so any fix upcoming?
@rumisle commented on GitHub (Jul 4, 2024):
This doesn't seem to be an ArchiveBox problem. From what I remember, Chromium just won't load profiles when it's launched headless. I can replicate this on my Mac, launching Chromium with a data directory, headless on/off. Not sure why it's designed this way or how to work around it.
@pirate commented on GitHub (Jul 7, 2024):
Chromium headless=new can load profiles, it just takes a specific combination of flags. I'll dig into this more post honeymoon!
@mstarodub commented on GitHub (Nov 13, 2024):
I'm experiencing the polar opposite of this (at least with the 0.8rc). Singlefile seems to be logged in, but everything else isn't.
I've verified that the profile is set up correctly by running with
CHROME_HEADLESS=False- opened chromium with--user-data-dir=...prior to runningarchivebox add, and it opened the archived site in the same session.However, when running archivebox with the chromium instance closed, it gets logged out. Even weirder is that a systemwide extension I manually disabled in the dedicated archivebox profile gets re-enabled?! So I have reason to believe it somehow resets the profile completely.
I've been debugging this for hours and at this point the only thing I haven't done yet is reading through chromium / archivebox source code. Would really appreciate some help as the archival matter at hand is quite urgent
@lamons commented on GitHub (Dec 23, 2024):
I am having the similar issue here, in novnc everything seems working fine, but every archive method failed (stuck in the login page). I tried having
CHROME_HEADLESS=Falsewith novnc on and can actually see the chrome window pop up with successfully logged in page, but the archive still showing a login page. I tried also running sing-file (with--browser-headless=true) inside the container and it got the login page as well, if using--browser-headless=falsethe chrome windows poped up for a second and crashed for not sure what reason, but I can see in that second it's still the login page, so the profile probably failed to be loaded by single-file.@TooManyStacks commented on GitHub (Jan 24, 2025):
I am seeing the same, `image: archivebox/archivebox:latest' I tried to put a cookies file, that didn't work. Checked with cat in the container,the data was there.
Then tried to pass my chromium profile with the cookies accepted on the site I am testing on (ad.nl a newspaper). I now see it finally gets the page title but still the cookie pop up blocks the whole page. Tried adding a cookie accept extension, no dice.
Also tried some other sites,all of them claim I never accepted cookies.
@yosofbadr commented on GitHub (Jun 8, 2025):
I am having the same issue, not sure if it is a misconfiguration on my part, if it is not supported, or if the website I am trying to archive is indeed logging me out.
Edit: Did some testing and it looks like the site detects chrome-driver and automatically logs out the account. A bit of a shame, are there any work arounds for this?
Edit2: I tested this in a small PoC, it was the site automatically logging me out, however I was able to get around this by injecting cookies after the site loaded then using SingleFile to download the page.
@TooManyStacks commented on GitHub (Jun 13, 2025):
What I notice in on projects, like playwright. Is that cookies do not easily transfer from system to system and headless and with head act differently with cookies.
Maybe the same is true for some sites here as well?
@pirate commented on GitHub (Jun 26, 2025):
Yes that is unfortunately true @TooManyStacks, many sites tie the session to a browser fingerprint and will invalidate it if they detect a differnt browser trying to re-use cookies. Adding other headers like the USER_AGENT can help but it's not perfect, that's why it's better to use a full
CHROME_USER_DATA_DIRwith exactly the same browser binary used to set it up (instead of only cookies), and always run in headful mode.@TooManyStacks commented on GitHub (Jun 26, 2025):
I tried with the chrome dir, but not headful mode. Will do that as a test later.
@pirate commented on GitHub (Dec 29, 2025):
@claude can you fix this on dev, make sure singlefile uses the same SINGLEFILE_CHROME_ARGS (x-fallback: CHROME_ARGS) as the chrome plugin so that it gets the user data dir and other launch options.
@claude[bot] commented on GitHub (Dec 29, 2025):
Claude finished @pirate's task in 3m 45s —— View job •
claude/issue-1445-20251229-2236• Create PR ➔Fixed SingleFile Chrome arguments ✓
Task: Make SingleFile use the same Chrome arguments (SINGLEFILE_CHROME_ARGS with fallback to CHROME_ARGS) as other Chrome-based plugins so that user data directory and cookies work correctly.
Changes Made
Added
SINGLEFILE_CHROME_ARGSconfig option inarchivebox/plugins/singlefile/config.json:CHROME_ARGS--user-data-dirUpdated SingleFile extractor in
archivebox/plugins/singlefile/on_Snapshot__50_singlefile.py:SINGLEFILE_CHROME_ARGS(with fallback toCHROME_ARGS)--browser-argsparameterHow It Works
When you set
CHROME_USER_DATA_DIRor add Chrome arguments viaCHROME_ARGS, SingleFile will now:SINGLEFILE_CHROME_ARGSenvironment variableCHROME_ARGSif not set--browser-argsThis ensures SingleFile respects the same Chrome configuration (user data directory, cookies, etc.) as other Chrome-based extractors like PDF and Screenshot.
Example Usage
--- | Branch: claude/issue-1445-20251229-2236