[GH-ISSUE #1127] Question: Difficulty archiving site behind OAuth2 authentication using Chrome user data directory #707

Closed
opened 2026-03-01 14:45:40 +03:00 by kerem · 2 comments
Owner

Originally created by @Michael-Z-Freeman on GitHub (Mar 23, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1127

Hi,

I'm needing to archive a site behind Oauth2 authentication - https://learningspace.falmouth.ac.uk/course/view.php?id=6885

I have followed the guides in great detail using a local copy of Google Chrome (I'm not using docker) on MacOS. I also tried Chromium. Archivebox shows the correct setup with correct Chrome user directory. However the links above always shows the login page.

I suspect what is happening is that Chrome/Chromium is always revealing itself as a bot/headless and Oauth2 and goodness knows what other security measures on that site detects that and treats the connection as a new unlogged in client EVEN if the login credentials are valid and available.

Would really like to get to the bottom of this.

Originally created by @Michael-Z-Freeman on GitHub (Mar 23, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1127 Hi, I'm needing to archive a site behind Oauth2 authentication - https://learningspace.falmouth.ac.uk/course/view.php?id=6885 I have followed the guides in great detail using a local copy of Google Chrome (I'm not using docker) on MacOS. I also tried Chromium. Archivebox shows the correct setup with correct Chrome user directory. However the links above always shows the login page. I suspect what is happening is that Chrome/Chromium is always revealing itself as a bot/headless and Oauth2 and goodness knows what other security measures on that site detects that and treats the connection as a new unlogged in client EVEN if the login credentials are valid and available. Would really like to get to the bottom of this.
kerem 2026-03-01 14:45:40 +03:00
Author
Owner

@pirate commented on GitHub (Mar 31, 2023):

can you try running chromium --headless=new ... user data dir args ... --screenshot 'https://learningspace.falmouth.ac.uk/course/view.php?id=6885' directly in CLI, the latest chromium release has a new headless mode that supposedly hides from bot detection better.

<!-- gh-comment-id:1491678364 --> @pirate commented on GitHub (Mar 31, 2023): can you try running `chromium --headless=new ... user data dir args ... --screenshot 'https://learningspace.falmouth.ac.uk/course/view.php?id=6885'` directly in CLI, the latest chromium release has a new headless mode that supposedly hides from bot detection better.
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

closing for now but comment back if you still need help

<!-- gh-comment-id:1589019881 --> @pirate commented on GitHub (Jun 13, 2023): closing for now but comment back if you still need help
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#707
No description provided.