mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #204] Archive Method: CHROME_USER_DATA_DIR is being ignored, authenticated site archiving fails #139
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#139
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jamelait on GitHub (Apr 1, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/204
Hi,
There is a website that requires me to be logged in that I want to archive.
I have logged in using chromium-browser (launched from bash) and I made sure to set the env (with export command) for:
CHROME_BINARY is set to value returned by the command "which chromium-browser"
CHROME_USER_DATA_DIR is set to value that I saw in "chrome://version"
I then do
The command runs fine but it looks like my credentials are not used to access the website (a message saying that i should be logged in is shown in the archived html).
What could be the issue?
chromium-browser --version : Chromium 73.0.3683.75 Built on Ubuntu , running on Ubuntu 18.04
@pirate commented on GitHub (Apr 2, 2019):
Huh, strange. Try setting
CHROME_HEADLESS=Falseso you can watch the browser UI as it archives, it may reveal some problem.You can also try running the chrome command manually like this (try it both with and without
--headless, also replace thatuser-data-dirpath with your correct one if different):@jamelait commented on GitHub (Apr 3, 2019):
Setting
CHROME_HEADLESS=Falsedid reveal the problem: it was using the wrong user data directory.I was setting the env variable like this:
CHROME_USER_DATA_DIR=/home/jamel/.config/chromium/DefaultBut it seems that in that configuration, chromium creates a new profile directory so the new
user-data-dirbecomes/home/jamel/.config/chromium/Default/Default.So of course I wasn't logged in.
Problem solved!
I did notice a weird thing: it seems that
/ArchiveBox/output/archive/.../members.website.com/my-account/index.htmlwas not accessed with a logged in profile but/ArchiveBox/output/archive/.../output.htmlwas.@pirate commented on GitHub (Apr 3, 2019):
Ah yes, this problem is common, I’ve made the same mistake of using the default directory before too.
The Index.html output is generated using Wget, not chrome, to make that one be logged in you have to pass a
COOKIES_FILE=...parameter.