[GH-ISSUE #344] Cloudflare Captchas #225

Closed
opened 2026-03-02 11:47:46 +03:00 by kerem · 7 comments
Owner

Originally created by @MRDGH2821 on GitHub (Aug 5, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/344

Ihave few URLs which use cloudflare captchas
This makes the bookmark preview filled with captchas & AI tags being secruity related instead of the actual link.
image

Links using Cloudflare's Captcha system:
https://www.random.org/
https://tineye.com/

Originally created by @MRDGH2821 on GitHub (Aug 5, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/344 Ihave few URLs which use cloudflare captchas This makes the bookmark preview filled with captchas & AI tags being secruity related instead of the actual link. ![image](https://github.com/user-attachments/assets/86cdf0d9-0d76-4aa4-91ee-529e3f0675e7) Links using Cloudflare's Captcha system: https://www.random.org/ https://tineye.com/
kerem closed this issue 2026-03-02 11:47:46 +03:00
Author
Owner

@kamtschatka commented on GitHub (Aug 5, 2024):

This is definitely not an issue Hoarder will tackle. The whole point of this captcha is to prevent scraping. There is even a project that tries to prevent it: https://github.com/FlareSolverr/FlareSolverr
And they are always struggling to make it work (currently broken AFAIK)

<!-- gh-comment-id:2269032689 --> @kamtschatka commented on GitHub (Aug 5, 2024): This is definitely not an issue Hoarder will tackle. The whole point of this captcha is to prevent scraping. There is even a project that tries to prevent it: https://github.com/FlareSolverr/FlareSolverr And they are always struggling to make it work (currently broken AFAIK)
Author
Owner

@huyz commented on GitHub (Aug 7, 2024):

Wouldn't https://github.com/hoarder-app/hoarder/issues/172 tackle this issue?

<!-- gh-comment-id:2273851433 --> @huyz commented on GitHub (Aug 7, 2024): Wouldn't https://github.com/hoarder-app/hoarder/issues/172 tackle this issue?
Author
Owner

@MohamedBassem commented on GitHub (Aug 8, 2024):

Yeah, "crawling" a captcha protected website will most likely never work. The solution would be to capture the html from the extension directly as suggested in #172.

<!-- gh-comment-id:2276184723 --> @MohamedBassem commented on GitHub (Aug 8, 2024): Yeah, "crawling" a captcha protected website will most likely never work. The solution would be to capture the html from the extension directly as suggested in #172.
Author
Owner

@MohamedBassem commented on GitHub (Aug 18, 2024):

I'll close this in favor of #172

<!-- gh-comment-id:2295272294 --> @MohamedBassem commented on GitHub (Aug 18, 2024): I'll close this in favor of #172
Author
Owner

@makcorner commented on GitHub (Jul 11, 2025):

Is the #172 work in progress?

I'm wondering if another solution could be - only for web browser version - to allow user manually go through cloudflare (and possibly any other) human verification? Like opening another pop-up window when human verification is detected - and allow manual intervention. Or it could be triggered on request only, like right-click -> "perform manual download" for already existing karakeep entries, those which are affected by cloudflare proxy.

<!-- gh-comment-id:3063148998 --> @makcorner commented on GitHub (Jul 11, 2025): Is the #172 work in progress? I'm wondering if another solution could be - only for web browser version - to allow user manually go through cloudflare (and possibly any other) human verification? Like opening another pop-up window when human verification is detected - and allow manual intervention. Or it could be triggered on request only, like right-click -> "perform manual download" for already existing karakeep entries, those which are affected by cloudflare proxy.
Author
Owner

@kamtschatka commented on GitHub (Jul 12, 2025):

I actually managed to avoid issues with cloudflare protection by simply removing navigator.webdriver from the context in the page.
Not sure if any of the other settings help, but I have never had any issues with cloudflare:

My java code, if anyone is interested (see createNewPage on the vital part)

    public static Browser createPlaywrightBrowser(Playwright playwright) {
        return playwright.chromium().launch(
                new BrowserType.LaunchOptions()
                        .setHeadless(false)
                        .setSlowMo(50)
        );
    }

    public static BrowserContext createPlaywrightContext(Browser browser) {
        return browser.newContext(
                new Browser.NewContextOptions()
                        .setViewportSize(1920, 1080)
                        .setLocale("en-US")
                        .setTimezoneId("America/New_York")
                        .setPermissions(List.of("geolocation"))
                        .setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36")
        );
    }

    public static Page createNewPage(BrowserContext browserContext) {
        Page page = browserContext.newPage();
        page.addInitScript("Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
        return page;
    }
<!-- gh-comment-id:3064948473 --> @kamtschatka commented on GitHub (Jul 12, 2025): I actually managed to avoid issues with cloudflare protection by simply removing navigator.webdriver from the context in the page. Not sure if any of the other settings help, but I have never had any issues with cloudflare: My java code, if anyone is interested (see createNewPage on the vital part) ``` public static Browser createPlaywrightBrowser(Playwright playwright) { return playwright.chromium().launch( new BrowserType.LaunchOptions() .setHeadless(false) .setSlowMo(50) ); } public static BrowserContext createPlaywrightContext(Browser browser) { return browser.newContext( new Browser.NewContextOptions() .setViewportSize(1920, 1080) .setLocale("en-US") .setTimezoneId("America/New_York") .setPermissions(List.of("geolocation")) .setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36") ); } public static Page createNewPage(BrowserContext browserContext) { Page page = browserContext.newPage(); page.addInitScript("Object.defineProperty(navigator, 'webdriver', { get: () => undefined })"); return page; } ```
Author
Owner

@oguime commented on GitHub (Oct 28, 2025):

My java code, if anyone is interested (see createNewPage on the vital part)

Hi, I`m new to KaraKeep, would you mind explaining how to use this method?

Thanks!

<!-- gh-comment-id:3456337280 --> @oguime commented on GitHub (Oct 28, 2025): > My java code, if anyone is interested (see createNewPage on the vital part) Hi, I`m new to KaraKeep, would you mind explaining how to use this method? Thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#225
No description provided.