mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #952] Bug: CHROME_USER_DATA_DIR not working for login #591
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#591
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ga-it on GitHub (Mar 21, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/952
Describe the bug
Hi
I wish to use ArchiveBox to save content from subscriptions behind paywalls.
I have exported cookies in Netscape format (for WGET I understand - COOKIES_FILE) and shared Chromium paths with Archivebox (CHROME_USER_DATA_DIR).
Neither have resulted in logins (evident in screenshots and denied access).
The latter Chromium user data folder process has been particularly problematic - rejecting the path provided (no "Default" profile found where it was there) before suddenly accepting it.
To ensure a usable cookie, I created a VNC session on the server, browsed via Chromium to the site, logged in and then tried using archivebox and providing the path to the User Data folder.
Great project - but the login feature is critical for me to archive content behind paywalls.
Regards
Marc
Steps to reproduce
Screenshots or log output
ArchiveBox version
ArchiveBox Dev docker image running on Debian Testing
@pirate commented on GitHub (Mar 22, 2022):
Are you mounting the chrome profile inside of docker? Keep in mind the Chrome version inside docker and outside is different, you must create the profile with the exact same browser binary. e.g. you cant use a Chrome profile generated outside Docker if you're using Chromium inside Docker for ArchiveBox
CHROME_BINARY.Also please post the full output of
archivebox versionanddocker-compose.yml, don't redact it or I cant help. The ticket instructions are there for a reason.Try setting
CHROME_HEADLESS=Trueand making sure the browser GUI that shows is loading with the correct profile when runningarchivebox add.What path are you using for the user data dir and can you post a screenshot of that dir so we can make sure it's the right one.
https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#chrome_binary
https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#chrome_headless
@pirate commented on GitHub (Mar 22, 2022):
Are you mounting the chrome profile inside of docker? Keep in mind the Chrome version inside docker and outside is different, you must create the profile with the exact same browser binary.
@terxw commented on GitHub (Mar 24, 2022):
I had the same problem, probably its upstream problem with headless chromium or as pirate says incompatible chrome versions.
I got it working after replicating my setup outside docker with google-chrome for CHROME_BINARY, afterwards the profile with cookies and logged in session works
@mwnoo commented on GitHub (Mar 26, 2022):
I'm using docker-compose to run ArchiveBox (v0.6.2) with chromium 90.0.4430.93 running inside the container. Outside docker I'm running chromium 99.0.4844.82 on Ubuntu 20.04.
How can I use the same chromium binary to create the profile?
On Ubuntu 20.04 I cannot install chromium version 90.0.4430.93:
sudo apt install chromium-browser=90.0.4430.93E: Version '90.0.4430.93' for 'chromium-browser' was not foundDo I need to mount the folder containing chromium 99.0.4844.82 on Ubuntu 20.04 (
/snap/bin/chromium) inside the docker container? Or can I update chromium inside the docker container?If I add
/snap/bin:/snap/binas a volume in the docker-compose.yml and setCHROME_BINARY=/snap/bin/chromiumand rundocker-compose run archivebox --versionI get an error saying! CHROME_BINARY: /snap/bin/chromium (unable to detect version)@mwnoo commented on GitHub (Mar 27, 2022):
UPDATE
I was able to install chromium 90.0.4430.93 on MX Linux (same version as in the ArchiveBox docker image)
On the host I visited a few sites and accepted the cookies.
Then I copied the profile folder to a folder that is mounted in the docker image:
cp -r ~/.config/chromium/ chromiumWhen I check the configuration, the CHROME_USER_DATA_DIR seems valid:
$ sudo docker-compose run archivebox --versionUsing
sqlitebrowserI can see that the Cookie database contains the data.Also the hashes are the same:
However, when I archive the same sites using Archivebox the cookie banners are still shown in the output (pdf, screenshot, etc.).
Any ideas what is going wrong here?
docker-compose.yml
I mount the profile folder as read-only (ro) otherwise the contents of the Cookie database are cleared by archivebox
ArchiveBox.conf
@szenrom commented on GitHub (Apr 6, 2022):
Hi!
I'm not sure if I should create a new issue as my problem is somewhat similar to OP's (ArchiveBox doesn't seem to properly use Google Chrome user directory as seen in OP's update) but I don't use Docker, rather installed it with
pip.Please let me know if I should repost it as a new issue.
My setup (details below):
pyenvvirtualnenv(Python 3.10.4)nvm(node 17.8.0)wgetinstalled with MacPortsMy issues:
archivebox addit doesn't seem to use browser user profile (websites are viewed in not logged in version and all cookie banners are up).CHROME_USER_DATA_DIRandCHROME_BINARYproblem became worse as after making sure I'm logged in on the website, closing Chrome and running ArchiveBox I get same output as before but I'm also logged out of the website when I restart Chrome.CHROME_HEADLESS=Falsesetting opens proper user profile (I verified it by seeing add-ons) but with already lost cookies.single-fileas it doesn't show Chrome.devbranch version but it had same issues with a few more (there was an issue with checking version of installed Node modules).Details of the setup:
Output of command to check versions after manual installation
Output of
archivebox --versionContents of
ArchiveBox.conf@ga-it commented on GitHub (Apr 9, 2022):
Thanks @pirate
I have now successfully got Archivebox to use my Chromium profile.
Resolution:
ArchiveBox.conf
[SERVER_CONFIG]
SECRET_KEY = XX
[ARCHIVE_METHOD_OPTIONS]
CHROME_USER_DATA_DIR = /data/chromium
[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = False
docker-compose.yml
services:
archivebox:
XX
ports:
XX
environment:
XX
volumes:
XX
- /XX/.config/chromium:/data/chromium
@ga-it commented on GitHub (Apr 10, 2022):
I have since found synchronizing the chromium version in the docker file and on the host to be a nightmare.
If they are not exactly synced, the profiles become incompatible.
The chromium version in the current dev image is 90.0.4430.212. When downloading this via https://chromium.cypress.io/ it actually downloads 90.0.4430.0 resulting in an inability to reaccess the profile after use by the docker image.
To prevent this, I have followed the following workaround:
I found each step to be crucial especially permissions.
Now profile is generated and used by same instance of chrome on docker host and container.
@pirate commented on GitHub (Apr 11, 2022):
Yeah those steps sound right, unfortunately that is the status quo right now. There's not an easy way around it to make profiles compatible across versions. The next release will update chrome to the latest version which may make things slightly easier.
@pirate commented on GitHub (Apr 12, 2022):
I've added the instructions from your steps @ga-it to the wiki for future reference: https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
If possible, can you provide the steps you used to install/setup the vncserver connected to Chrome? Thanks @ga-it!
@OlegShevtsov1 commented on GitHub (Jun 17, 2023):
@pirate
I'm reproducing the https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install#setting-up-a-chromium-user-profile
√ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromiumHaving noticed difference the one step should be added as root:
Due to data directory has permissions of user
systemd-coredumpafter initdocker-compose run archivebox init --setupAnd then your steps
But it does not have access to protected page unfortunately anyway
OS Ubuntu 22.04.