mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1127] Question: Difficulty archiving site behind OAuth2 authentication using Chrome user data directory #707
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#707
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Michael-Z-Freeman on GitHub (Mar 23, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1127
Hi,
I'm needing to archive a site behind Oauth2 authentication - https://learningspace.falmouth.ac.uk/course/view.php?id=6885
I have followed the guides in great detail using a local copy of Google Chrome (I'm not using docker) on MacOS. I also tried Chromium. Archivebox shows the correct setup with correct Chrome user directory. However the links above always shows the login page.
I suspect what is happening is that Chrome/Chromium is always revealing itself as a bot/headless and Oauth2 and goodness knows what other security measures on that site detects that and treats the connection as a new unlogged in client EVEN if the login credentials are valid and available.
Would really like to get to the bottom of this.
@pirate commented on GitHub (Mar 31, 2023):
can you try running
chromium --headless=new ... user data dir args ... --screenshot 'https://learningspace.falmouth.ac.uk/course/view.php?id=6885'directly in CLI, the latest chromium release has a new headless mode that supposedly hides from bot detection better.@pirate commented on GitHub (Jun 13, 2023):
closing for now but comment back if you still need help