mirror of
https://github.com/karakeep-app/karakeep.git
synced 2026-04-25 07:56:05 +03:00
Open
opened 2026-03-02 11:48:13 +03:00 by kerem
·
63 comments
No Branch/Tag specified
main
refactor/use-npm-singlefile
onetab
claude/issue-2596-20260321-1401
claude/fix-docs-button-responsive-V3aBQ
claude/review-import-backpressure-D4ArJ
claude/fix-archived-bookmarks-mobile-P9OJW
claude/issue-1189-20260211-1601
claude/fix-nested-smart-lists-3uFkt
claude/issue-2298-20251223-1704
feat/import-v3
claude/add-cli-search-subcommand-6kIe0
claude/add-bookmark-indexing-timestamps-96bPj
claude/auto-disable-failing-feeds-fkDhP
claude/add-tag-search-aliases-HzESD
feat/docker-compose-dev
claude/add-attachedby-tags-endpoint-01WYfemMGHJJjXsPYLvUJAno
claude/fix-crawler-memory-leaks-NE7Ct
bookmark-debugger
claude/issue-2352-20260106-1120
claude/issue-1977-20260102-2348
claude/add-banner-rendering-JeLUk
claude/add-descendant-qualifier-cUm26
claude/skip-metadata-refresh-archives-CAo4Y
claude/fix-archive-pending-banner-pAyGM
claude/add-embeddings-support-h2swV
claude/nested-manage-lists-QVV85
claude/privacy-type-system-MG1bT
claude/add-action-menu-icons-6hNKw
claude/issue-2299-20251223-1711
claude/bookmark-indexing-progress-QwZSI
claude/migrate-bookmark-attachments-3O2te
claude/add-2025-wrapped-feature-tIUIh
claude/improve-ai-settings-design-639tq
claude/add-youtube-metascraper-plugin-0lWC7
claude/add-problem-reporting-gSSEV
claude/add-mobile-list-menus-spcS7
claude/shadcn-bookmark-cards-WWHzP
claude/add-extensions-link-HTeXc
claude/add-onboarding-screens-hsYMO
claude/fix-settings-switch-overflow-nlzM4
claude/clamp-bookmark-titles-diAEz
claude/port-stats-mobile-expo-MuXAn
claude/whats-new-base-version-vrv8C
claude/fix-settings-auth-checks-jgyD8
claude/add-server-version-display-3sGa2
claude/fix-tag-editor-scrolling-rzdbG
claude/add-company-pricing-card-y5mHY
claude/audit-optimize-transactions-xpDVc
codex/ensure-consistent-ui-experience-across-app-pages
claude/plan-opentelemetry-integration-01Jx183mz1Ev8h8JoYj97Auw
libsql
db-indicies
claude/export-import-lists-01UuCWwdaqduAd35NppvjnMD
claude/configurable-worker-timeout-0198GQh6YrrRzqG62xnogyrz
claude/check-import-quota-01CPdxTpHp18Ba62bYcBTVbA
claude/scraper-worker-thread-01FEHen6MGrQHmdBstJSuiyA
claude/customize-dialog-styling-01CVjEv2KgyZJSpCg3mqkvR7
claude/add-asset-cache-headers-0175WhNcqwiwurrmjj52jnLT
claude/add-db-search-plugin-017Xxd4Jq3MfjWT788vgfbaq
benchmarks-2
claude/add-filtered-deletion-01DTxWNcg3hhqdNpeNLa3s6L
claude/actionbutton-loading-spinner-015DY5ZTvgPgFAXTZz3UGaYv
claude/add-broken-links-qualifier-01S31X1LsKiYb9gE1dXTKvi3
claude/docker-release-tag-trigger-01UmzFXEumhK2jdmRGtMcueo
claude/spread-feed-fetch-scheduling-01EihUtmZSyqeE1HfRMessxW
restate-idempotency
claude/align-android-ios-colors-01GJfkhEyZVBReohVioPa8ok
claude/improve-mobile-app-colors-0155LzHfkd5HyJr6YyZMsus5
codex/add-autocomplete-for-search-query-language
claude/add-bookmark-backups-016L2A8Z94n7tDgDdMPdFuAd
claude/restrict-binary-user-permissions-01FSGyy2RXGZvE26YbAejzGi
effect-ts
claude/prepare-trpc-npm-publish-0193EjfwpxSNVNcLXqXjs6Ln
shared-list-sidebar
claude/lazy-load-tiktoken-017UTNpJPTcMMQvNEBa1aFwo
codex/fix-asset-pre-processing-worker-abort-signals
add-groupid
claude/add-bookmark-list-button-01VF7uXYNLsVDzqdozWMXP5M
claude/extract-shared-ui-components-01DSVfaCr6WRqAyx1vJTZk9r
claude/migrate-shadcn-sidebar-01DKjpg9MD5PJ2potemSnbvW
claude/add-collaborators-rate-limits-01VjXyRWWPUkGQKa8d8D8qKj
claude/modernize-dark-mode-01FRfE81PAY5C44pFu1cYocf
claude/add-signed-url-bookmark-01PjYT1ZhvLK2FPJNTAhJsWf
restate-group-id
claude/add-highlights-page-012vhHpn8fVNp3gf7gBeW14s
claude/disable-shared-bookmark-features-01B9fiGUdu6NyWaxSQFsQBxP
claude/mobile-bookmark-grid-layouts-018cGBBMhPJVq6PJVRBpqT2r
claude/add-mobile-bookmark-summary-01494LYoh4sJW5Fj4GPm62Vj
claude/add-mobile-tags-screen-01WRADt4ZzvXVew1Y9vqF8SV
claude/add-highlight-notes-01LpanRLS4a2YMnT1qB5GTqX
claude/add-search-bar-014k2ngaqjwYRVSvqmbuECqr
claude/hide-collaborator-emails-01TQrkkMupC7CR9BTuDkireg
claude/list-invitation-approval-0129V89M1riXW6JqmoF74VfM
claude/add-bookmark-archive-sort-018VbGPGvtmsGgXFEERoAX7B
claude/add-mobile-smart-lists-01251tYo9u1SywE6XFezAv9e
claude/bookmark-drag-drop-01DmWq286ogHpDGHKcXjKr3z
claude/add-rss-import-01DH1Q2axcDeq8nQJR5MWjPJ
claude/mobile-inapp-browser-auth-01KiT6bwyntRPQ1X4oTtAveC
claude/offline-mode-react-query-01D1rE2bdBEPw2teGqunr5Gd
claude/add-singlefile-extension-support-01BEB9QQZABzwfZDvR9Bz5b2
claude/custom-list-slugs-01VxcfkNUXZ97FNpNVURopMq
claude/issue-2148-20251118-1133
claude/add-groupid-queue-fairness-011CV1r8Wb46HuGAg5o95i3m
claude/hide-viewer-shared-lists-01Fst6NBvdxrXXnDhUmjsNDP
claude/collaborative-lists-013AvDvMqkoszDVcSoCYgBcM
claude/implement-feature-01LT5XzGsbEhZkYXNEjEwdui
claude/fix-bookmark-loading-state-01AgF4H2drxwuTCJDB2Xgiu4
claude/admin-user-edit-013tbiRmb1KX2fhSYqmGKCu8
claude/expose-all-api-01YTruEW72WQYMtq4iZoaPkA
claude/add-doc-link-main-016NYLxShpKuH6R8XCBgeZtc
claude/fix-issue-2133-019JLvdSRAUbU4FtjQztcM6S
claude/explore-effect-ts-integration-01F7xb1dWwP1ma4LnLbFGfDD
claude/optimize-dockerfile-build-011CV5gDnPZbdbbVSPDofC4e
claude/add-custom-headers-guide-011CV249t16aWDRb1mCrzQdC
claude/mobile-app-signup-011CUxPtCXgU6U3T8GShTR2Q
claude/crawler-worker-fetch-browser-011CUvcRc24XEr9DTWDW6MX8
claude/fix-issue-784-011CUvubQrcZHG9S3KjpCKbK
codex/add-user-settings-for-inference-language-and-screenshots
claude/fix-mobile-signin-server-address-011CUnaUWwY2Fhq5Xbwhgr8H
better-auth-2
claude/issue-2028-20251012-1429
claude/issue-1010-20251012-1154
codex/update-feed-refresh-job-idempotency-key
restate
import-v2
fix-public-lists
recurse-delete-list
abort-dangling-processing
tag-pagination
ratelimit-plugin
claude/issue-1937-20250914-0912
codex/implement-title-search-query-qualifier
copilot/add-edit-button-for-notes
cookie-path
ai-tag-cleanup
codex/add-allowlist-and-blocklist-env-variables
mobile-retheme
expo-next-upgrade
opencode/issue1788-20250727215611
fix-trailing-slash-deduplication
edit-bookmark-dialog
bookmark-embeddings
rag
nextjs-15
bookmark-hover-bar
sapling-pr-archive-MohamedBassem
track-bookmark-assets
json-cli
admin-settings
mobile-dark-mode
android/v1.9.2-0
ios/v1.9.1-1
android/v1.9.1-0
ios/v1.9.1-0
ios/v1.9.0-2
ios/v1.9.0-1
android/v1.9.0-1
extension/v1.2.9
cli/v0.31.0
sdk/v0.31.0
mcp/v0.31.0
android/v1.9.0-0
ios/v1.9.0-0
v0.31.0
android/v1.8.5-0
cli/v0.30.0
sdk/v0.30.0
ios/v1.8.4-0
android/v1.8.4-0
v0.30.0
cli/v0.29.1
v0.29.3
v0.29.2
v0.29.1
sdk/v0.29.0
cli/v0.29.0
mcp/v0.29.0
ios/v1.8.3-0
android/v1.8.3-0
extension/v1.2.8
v0.29.0
android/v1.8.2-2
android/v1.8.2-1
ios/v1.8.2-0
android/v1.8.2-0
extension/v1.2.7
android/v1.8.1-0
ios/v1.8.1-0
v0.28.0
cli/v0.27.1
cli/v0.27.0
v0.27.1
sdk/v0.27.0
v0.27.0
android/v1.8.0-1
ios/v1.8.0-1
mcp/v0.26.0
sdk/v0.26.0
v0.26.0
cli/v0.25.0
ios/v1.7.0-1
mcp/v0.25.0
v0.25.0
extension/v1.2.6
ios/v1.7.0-0
android/v1.7.0-0
v0.24.1
v0.24.0
mcp/v0.23.10
mcp/v0.23.9
mcp/v0.23.8
extension/v1.2.5
mcp/v0.23.7
mcp/v0.23.6
mcp/v0.23.5
mcp/v0.23.4
sdk/v0.23.2
cli/v0.23.0
extension/v1.2.4
android/v1.6.9-1
ios/v1.6.9-1
v0.23.2
v0.23.1
sdk/v0.23.0
v0.23.0
ios/v1.6.9-0
sdk/v0.22.0
v0.22.0
android/v1.6.8-0
ios/v1.6.8-0
sdk/v0.21.2
sdk/v0.21.1
sdk/v0.21.0
v0.21.0
cli/v0.20.0
v0.20.0
ios/v1.6.7-4
android/v1.6.7-4
ios/v1.6.7-3
android/v1.6.7-3
android/v1.6.7-2
ios/v1.6.7-2
android/v1.6.7-1
ios/v1.6.7-1
ios/v1.6.7-0
android/v1.6.7-0
v0.19.0
android/v1.6.6-0
android/v1.6.5-0
ios/v1.6.5-0
ios/v1.6.4-0
android/v1.6.4-0
v0.18.0
v0.17.1
v0.17.0
ios/v1.6.3-0
android/v1.6.3-0
extension/v1.2.3
ios/v1.6.2-1
android/v1.6.2-1
ios/v1.6.2-0
android/v1.6.2-0
v0.16.0
ios/v1.6.1-3
android/v1.6.1-3
ios/v1.6.1-2
android/v1.6.1-2
android/v1.6.1-1
ios/v1.6.1-1
android/v1.6.1-0
ios/v1.6.1-0
extension/v1.2.2
android/v1.6.0-1
ios/v1.6.0-1
ios/v1.6.0
android/v1.6.0
cli/v0.13.7
cli/v0.13.6
v0.15.0
cli/v0.13.5
extension/v1.2.1
v0.14.0
cli/v0.13.3
cli/v0.13.2
cli/v0.13.1
cli/v0.13.0
v0.13.1
v0.13.0
mobile-v1.5.0
mobile-v1.4.0
v0.12.2
v0.12.1
v0.12.0
v0.11.1
v0.11.0
v0.10.1
v0.10.0
v0.9.0
v0.8.0
v0.7.0
v0.6.0
v0.5.0
v0.4.1
v.0.4.0
v.0.3.1
v0.3.0
v0.2.2
v0.2.1
v0.2.0
v0.1.0
Labels
Clear labels
Mirrored from GitHub Pull Request
UI/UX
android
bug
dependencies
documentation
documentation
extension
feature request
feature request
good first issue
ios
long-term
performance
pri/high
pri/low
pri/medium
pull-request
Mirrored from GitHub Pull Request
question
status/approved
status/icebox
status/pending_clarification
status/untriaged
No labels
UI/UX
android
bug
dependencies
documentation
documentation
extension
feature request
feature request
good first issue
ios
long-term
performance
pri/high
pri/low
pri/medium
pull-request
question
status/approved
status/icebox
status/pending_clarification
status/untriaged
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".
No due date set.
Dependencies
No dependencies set.
Reference
starred/karakeep#266
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TurbulenceDeterministe on GitHub (Sep 23, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/414
When i use hoarder on a youtube link, the crawler get stuck with the cookie banner, any idea on how to solve this ?
@CrypticC3s4r commented on GitHub (Oct 5, 2024):
@Dwelled2593 can you provide some example links ?
@pix commented on GitHub (Oct 13, 2024):
It's a Cookies / GDPR notice:
From: https://www.youtube.com/watch?v=E-5b1iGNraM (Probably a EU thing)
@vhsdream commented on GitHub (Nov 18, 2024):
Can confirm this also happens with other types of sites that have notices a human has to click - for instance this article from the New York Times.
As a slightly humorous (and kind of irritating) aside, when I tried to ask my local LLM (Llama3.2) to summarize the article, this was it's response:
Edit: forgot to show what that news link appears as:
@hedger commented on GitHub (Dec 12, 2024):
As someone with a massive amount of YT links in bookmarks, I'd love to see this fixed.
ArchiveBox seems to handle it correctly when being fed direct YT links.
@ctschach commented on GitHub (Jan 6, 2025):
Okay, I'm running into the same issue.
One way to get around this would be the ability to provide a cookie file which is used when crawling sites. With this in place, YT should know that the cookies are already accepted. You can use a chrome extension called "Get cookies.txt locally" to get the required cookies. This would probably be helpful for other sites to.
@cakonopka commented on GitHub (Jan 20, 2025):
Got the same issue with Instagram Posts or X Posts
@fcorvelo commented on GitHub (Jan 22, 2025):
I don't know if this is the best solution, but I managed to get this to work using the "i-dont-care-about-cookies" Chrome extension. This extension automatically accepts and hides all the cookie banners.
The way I made this work was to make the following changes in the Chrome container:
--headless=newflag to allow loading extensions in headless mode.--load-extension=/i-dont-care-about-cookiesflag with the path to the previously mounted folder to tell Chrome from where to load the extension.And that's it! This way, when Hoarder calls Chrome it will load the extension and the extension will accept all the cookie banners, making them disappear.
One caveat (and this is why I say that maybe this is not the best solution) is that enabling the
--headless=newflag creates the following error on the Hoarder web container that keeps appearing in the logs every 5 seconds. I didn't see any issues using Hoarder with this error, but perhaps there is something that I'm not aware of.[Crawler] Failed to connect to the browser instance, will retry in 5 secs: TypeError: Failed to fetch browser webSocket URL from http://172.18.0.3:9222/json/version: fetch failedSo, I hope that this can help while there is no official way of doing it 😃
@kassyss commented on GitHub (Jan 23, 2025):
@fcorvelo you made my day « ! »
I was eager to get a workaround until this issue is resolved and you brought it. Who knows, the final resolution will maybe based on chromium addon use.
Thank you very much
@Deathproof76 commented on GitHub (Jan 24, 2025):
Without trying the workaround, I don't seem to have the same problem for the link previews. Also located in the EU.
For visibility (because of the giant screenshots) and maybe debug part of my .env:
another thing I can think of, that I do different than the standard compose.yaml, is that I exposed the chrome port directly
And connected the chrome instance via
BROWSER_WEB_URL: http://192.168.0.208:9222(LAN-IP of the server directly in the compose.yaml for the hoarder container).but of course:
@Deathproof76 commented on GitHub (Jan 24, 2025):
@fcorvelo If you have a minute:
I actually tried the workaround, but am most likely not savvy enough to get the intructions right and need some guidance.
cookiesso the folder with the unpacked extension files is in/mnt/Dockerspace/hoarder/cookiesat its root looking just like the first screenshotI can see the mounted folder including contents in the chrome container at its root /cookies
which leads to
Otherwise the preview seemed to work
and the archive looked like this:
edit:
logs of the chrome container:
logs of the hoarder container upon hoarding:
@Deathproof76 commented on GitHub (Jan 24, 2025):
For other sites I get the cached content and the archive, but yeah video download and full page screenshot seems to be broken. So I've definitely done something wrong with the chrome container.
@fcorvelo commented on GitHub (Jan 24, 2025):
@Deathproof76 I just checked again on my end and I can confirm that I see the same behavior as you do. So it appears that this workaround is not as good as I thought it would be.
For some sites like YouTube it does skip the banner and picks up the correct title and image for the card list, as you can see below. And this was what I was personally more interested in achieving. But as you mention, the content itself and the screenshots seem to be broken.
Before:
After:
@Deathproof76 commented on GitHub (Jan 24, 2025):
@fcorvelo If it's mostly about the previews on the main hoarder page, which are working for me ... Maybe it could be due to an env setting difference?
this is my full compose:
and heres the .env, I removed some privacy related stuff:
🤷♂️with this exact config I simply do
and it looks like this
@fcorvelo commented on GitHub (Jan 25, 2025):
@Deathproof76 That's very strange. I tried your environment variables (except the ones related to Inference because I'm not using that) and I still see the YouTube cookie banner.
Here is my compose stack (I'm using Portainer):
stack.env:
@Deathproof76 commented on GitHub (Jan 25, 2025):
@fcorvelo *currently on the phone. So just as an experiment. I use this exact url https://youtu.be/_7VXPS7q00Y?si=Nr8fnq37dQWAsHgv hoard it and it works on my end. Nice preview in app, archived full page screenshot got cookies etc.
for comparison the log of my hoarder container, would most likely be helpful to know where it differs on your end, sorry for the screenshot:
@Deathproof76 commented on GitHub (Jan 25, 2025):
In the app again for reference
@fcorvelo commented on GitHub (Jan 25, 2025):
@Deathproof76 I just found out that there are different behaviors depending on the YouTube link. If I use a link to a video, it works. But if I use a link to a channel, it doesn't. Can you try the channel of that video you sent? Like this: https://www.youtube.com/@VivaLaDirtLeague
@Deathproof76 commented on GitHub (Jan 25, 2025):
@fcorvelo well, and there it is 😅 at least we're finally on the same page and know how to reproduce this
I'd guesstimate that it has something to do with crawler trying to find a nice banner, settles on the YouTube.svg logo and chokes on it
And due to that we'll get to see the cookie screenshot as a backup, which is the "primary" screenshotable layer. Or something like that.
maybe a workaround could be to adjust the crawler logic to specifically ignore .svg files and/or possibly look for the biggest hoarder supported image format.
See #141 and https://github.com/hoarder-app/hoarder/issues/128
@Deathproof76 commented on GitHub (Jan 25, 2025):
Just spitballing on the side: Maybe I'd be possible to incorporate the integrated puppeteer adblock and enhance the blocklist for cookie requests? Mmm, that'll most likely block the whole page, I'd have to read up on the possibilities.
Also, maybe beyond the dns block scope, but with something like ublock origin for example it's possible to cosmetically filter specific annoyances away like the cookie stuff. Maybe it'd be possible to create filtertemplates for known offenders like YouTube via puppeteer in the hoarder container itself.
another possibility would be to find out why the hoarder container can't connect/or has problems to connect a chromium instance with a loaded extension and solve that
@pcasalinho commented on GitHub (Jan 26, 2025):
Don't know how this only happens in some youtube videos:
This link starts playing the video and then shows the cookie disclaimer: https://www.youtube.com/watch?v=CbIASgzUIUU
This video does not play until the cookie disclaimer is accepted: https://www.youtube.com/shorts/t67qpQFMDUw
You can view this in a private mode browser without cookies
In the first youtube link, hoarder does everything well.
On the second link, it "hoards" the disclaimer.
@ctschach commented on GitHub (Feb 3, 2025):
Do we have any progression or updates on this topic?
@gercollo commented on GitHub (Mar 13, 2025):
+1 on this. For me, it's happening with YouTube Shorts and recently started also with links from Google Maps.
@kafmees commented on GitHub (Mar 17, 2025):
It also uses the cookies consent page for tagging. So every link is tagged as Cookies, Privacy policy, consent, advertising. I'll try the workaround as this makes the app worthless atm.
@terribium commented on GitHub (Mar 17, 2025):
+1 for me too.
Tested these 2 urls to YouTube and both worked in incognito windows with "I still don't care about cookies" -extension in Vivaldi (Chrome).
URLS:
https://www.youtube.com/watch?v=CbIASgzUIUU
https://www.youtube.com/shorts/t67qpQFMDUw
Extension: https://chromewebstore.google.com/detail/i-still-dont-care-about-c/edibdbjcniadpccecjdfdjjppcpchdlm
Is this something we should and could incorporate in Hoarder?
@mobiledude commented on GitHub (Mar 22, 2025):
is there in general a way to bypass cookies not only youtube related? I am hoarding a lot from this site but it's cookie consent message makes it useless. https://www.totaaltv.nl/nieuws/kpn-slaat-handen-ineen-met-netflix-en-stelt-ook-nietklanten-in-staat-hiervan-te-profiteren/
@dennisvanderpool commented on GitHub (Apr 20, 2025):
Is it possible to take over the browser and just login to some sites? Then you can let it work for almost all sites, even ones where you really have to login.
Or as an alternative expose a folder where cookies can be copied into? Is it then possible to copy cookies from my desktop browser into that folder?
@fspv commented on GitHub (Apr 27, 2025):
I came up with this custom docker image, with adblock and "I still don't care about cookies" loaded
Works well for me and also exposes chrome UI on port 7900 where you can see an actual browser. This is basically the way it is done in selenium docker image, for example. I just added a couple of more tricks to make extensions load
@kafmees commented on GitHub (May 3, 2025):
Disclaimer: I'm a total noob, don't know how to code and I use LLMs for these kind of things. So I did not write any of this, but please don't call me a "vibe coder", as I'm working hard to learn and do more and more stuff myself.
As I don't want to use a custom docker image, I've created a solution for tagging youtube shorts which works for me for the time being.
Specifically, I set up a webhook in Karakeep (triggered on created events), which is handled by a small Python Flask service running in Docker. When a new bookmark is added:
Because I added a rule in Karakeep to automatically archive bookmarks with that tag, the incorrectly-tagged link gets archived right away.
I'm using this workaround until there's a more structural solution in place. Two things that would make this cleaner:
I know this isn't ideal, but it works for now, and maybe it helps someone else facing the same issue.
You can find the full stand alone script here: https://gist.github.com/kafmees/eb5f6705b29ca80d34e1fbd1817d4ab7
@ballerbude commented on GitHub (May 7, 2025):
@fspv How do I use that in context of Karakeep? Could you provide your full docker-compose file? Спасибо заранее.
@fspv commented on GitHub (May 11, 2025):
@ballerbude just replase "chrome" service in your docker compose with this one
@Deathproof76 commented on GitHub (May 12, 2025):
@fspv I can't seem to be able to get this running. Tried building outside of the stack and also as a drop in for the standard chrome, made sure every container is in the same network. Also tried removing the proxy line. Tried opening ports and made sure the docker network resolves the ip correctly.
The vnc instance via 7900 is accessible from within the lan (browser in a browser). But the 9222 remote debugging port is can only be curled from from within the built chrome container itself. Otherwise it's just "can't connect to browser" from karakeep on repeat.
Maybe a permissions issue? Some more guidance would be greatly appreciated.
@jncmney commented on GitHub (May 28, 2025):
The same thing happens on other sites. I tried this page: https://comicvine.gamespot.com/revival/4050-50379/characters/ and cannot get Karakeep to skip/ignore the consent form.
@ballerbude commented on GitHub (May 29, 2025):
Man, this is very frustrating and making this otherwise great tool somewhat obsolete if you live in the EU and visit EU sites. Readeck skips this consent banners and can scrape the text from the mentioned pages. So technically, it's very much possible.
@teddyfresco commented on GitHub (Jun 7, 2025):
I've tried adapting every suggestion on this issue, nothing worked, maybe because I am a useless newbie, but it seems Readeck is perfectly able to deal with it, in some way, as this blog post suggests:
https://cyb.org.uk/2025/06/05/self-hosted-bookmarking.html?ref=selfh.st
I'd very much like to use Karakeep, I consider it far superior, if not for this problem, which is a big hindrance...
@MohamedBassem commented on GitHub (Jun 7, 2025):
Folks, I hear your pain. I'll make this the main feature of the 0.26 release. I'm almost done with the 0.25 (should be out this weekend hopefully).
@pdc1 commented on GitHub (Jun 7, 2025):
Along the same lines I was wondering if it would be possible to allow custom adblock filter lists? I have a series of lists I use with Brave, and I'd like to match that on the crawler.
@CrazyWolf13 commented on GitHub (Jul 4, 2025):
Hi
Just checking in, I'm one of the maintainer of community-scripts, I'd love this feature too, espcially for tiktok reels, though that will possibly get even harder, as they serve a captcha currently:
@Tandem1000 commented on GitHub (Jul 23, 2025):
I updated to 0.26 yesterday. Unfortunately, the ‘GDPR notice’ problem still persists (here: youtube.com).
@fspv commented on GitHub (Jul 26, 2025):
Hey, I replied in this thread earlier with a docker compose file for the chrome image. I had some time recently to package it properly. You can find the result in this repo https://github.com/fspv/crawl-browser
Here is the minimum docker compose definition for the chrome target, which should work.
There was a mention of the bug in this thread that the port 9222 is not available remotely. I have managed to fix it. Feel free to try it out and let me know if it works or not.
It comes with https://github.com/uBlockOrigin/uBOL-home and https://github.com/OhMyGuus/I-Still-Dont-Care-About-Cookies extensions by default, so should block most of the banners and cookies notices. You can add other extensions as well (see the readme and examples in the repo)
@miraculix95 commented on GitHub (Jul 28, 2025):
Hi, thanks for the effort. Unfortunately it was not working for me.
What I did:
Results:
This is a major issue because these days 80% of websites have a cookie banner.
With the issue the software is unusable.
@fspv commented on GitHub (Jul 28, 2025):
Hmm, the result 2 is interesting. I wouldn't expect it to be the case. I'll try it myself. Maybe karakeep just doesn't wait long enough for the extensions to do the the job. Any chance you can try to connect to your container via port 7900, go to vnc.html and see what's actually going on when karakeep tries to fetch something?
@miraculix95 commented on GitHub (Jul 28, 2025):
Unfortunately I am a semi-literate as concerns development.
this is my current docker-compose.yml (caddy is started with another yml):
Please let me know what I should change.
##################
name: karakeep
services:
web:
image: ghcr.io/karakeep-app/karakeep:${KARAKEEP_VERSION:-release}
restart: unless-stopped
container_name: karakeep-web
volumes:
# By default, the data is stored in a docker volume called "data".
# If you want to mount a custom directory, change the volume mapping to:
# - /path/to/your/directory:/data
- data:/data
# ports:
# - 3000:3000
expose:
- "3000"
env_file:
- .env
environment:
MEILI_ADDR: http://meilisearch:7700
BROWSER_WEB_URL: http://chrome:9222
# OPENAI_API_KEY: ...
chrome:
# alternative image to circumvent cookie banners
image: nuhotetotniksvoboden/crawl-browser:latest # alternative chrome suggested by https://github.com/karakeep-app/karakeep/issues/414 https://github.com/fspv/crawl-browser
# image: gcr.io/zenika-hub/alpine-chrome:123
restart: unless-stopped
container_name: karakeep-chrome
command:
- --no-sandbox
# - --disable-gpu
# - --disable-dev-shm-usage
# - --remote-debugging-address=0.0.0.0
# - --remote-debugging-port=9222
# - --hide-scrollbars
networks:
- web
meilisearch:
image: getmeili/meilisearch:v1.13.3
restart: unless-stopped
env_file:
- .env
environment:
MEILI_NO_ANALYTICS: "true"
volumes:
- meilisearch:/meili_data
networks:
- web
volumes:
meilisearch:
data:
networks:
web:
external: true
name: web
@fspv commented on GitHub (Jul 29, 2025):
Okay, so I've just tested it myself, and indeed my image doesn't help. I tested it using news.google.com and it successfully closes the consent banner, but it takes a few seconds to do that.
The problem is that karakeep closes the website before extensions have a chance to do anything. I use this image for different purposes and it works well there, because I have added an artificial 10s sleep before page load and content grabbing. I don't see an option in karakeep to do that.
UPD: it kinda should be handled by this
github.com/karakeep-app/karakeep@afcc27d557/apps/workers/workers/crawlerWorker.ts (L414)not sure, though, why it doesn't work. Maybe logs can provide some insight, but I don't have much time to look at this as I'm not an active karakeep user at the moment@miraculix95 commented on GitHub (Jul 30, 2025):
Thanks a lot anyway ... a pity ... weird that nobody else is looking into this ...
@pdc1 commented on GitHub (Jul 31, 2025):
I have been doing some testing (hacking, really) and have found a few things that have helped. Unfortunately this is more for @MohamedBassem to integrate and not something a non-developer user would likely be able to do.
The first part is to get a browser image that persists its profile state (e.g. mount a local volume for profile) and also allows VNC access to the container so you can log in to accounts, solve captchas and whatnot as needed. I built my own (using Brave browser) but others were mentioned above.
The other thing that I found helped was to reuse the browser context, which effectively opens a new tab on the browser so it has access to the cookies from the session where you did the login/captcha/etc. Basically in
crawlPage, instead of having the browser object callnewContext, do something like this:Then at the end, instead of closing the context (since we want to keep reusing it), just close the page (this effectively closes a tab that was opened for the scraping):
I also found it helped to turn on some Playwright debugging messages:
@pdc1 commented on GitHub (Aug 1, 2025):
@miraculix95 if you could send me some links you are having problems with, I will try them in my hacked setup.
@ballerbude commented on GitHub (Aug 1, 2025):
Too complicated everything. The most streamlined solution would probably be something like miniflux RSS reading has implemented it. Just let me insert my cookies for personally defined sites. That way even paywalls could be circumvented.
@ballerbude commented on GitHub (Aug 2, 2025):
@pdc1 could you try heise.de and golem.de
@pdc1 commented on GitHub (Aug 2, 2025):
It's looking good, I think Brave is doing a good job with the cookie blocking. Note that I also run a pi-hole ad blocker on my home network which may skew the results a bit.
Here's an example from heise.de: Titanic: VR-Erfahrung zeigt Untergang aus Passagiersicht
The thumbnail looks like this:

And the first paragraph in Karakeep is:
And here is the screenshot that Karakeep saves:

golem.de got similar good results. I will try to package up a docker image you could try.
@pdc1 commented on GitHub (Aug 2, 2025):
Here's my setup, I hope I included everything people need to experiment.
docker-compose.ymlreplace thechromesection with this:Dockerfile. You'll need to replace the first line with your appropriate architecture (arm64v8is for my 64-bit Raspberry Pi 4). I'm a docker newbie (this was a ChatGPT assist 😉) so maybe there is a more generic Debian image.start.shscript:All three files go in the same directory. Create a
chrome-configdirectory for the persistent Brave user profile.Run
docker compose up --build -dand it should do its thing. The new container will be calledbrave-chrome.Now hopefully everything is running. Now to configure Brave. Install
tigervnc-vieweron the host (for me it wassudo apt install tigervnc-viewer, and connect usingxtigervncviewer -SecurityTypes None localhost:5901.In the VNC session, Brave should be running. Click the upper right menu and select Settings. Then click the upper left settings menu and select "Shields". Scroll near the bottom and select Content Filtering (Enable custom filters). Check the first 6 boxes (all the "Fanboy" options). You can also experiment with other filters as needed (I saw an Easylist Germany for example).
If you leave the tigervnc session open you can watch the webpage load as Karakeep does its thing! This might help pinpoint problems with timing for example.
I hope this helps. I'm not sure how much of all this could be pre-configured into a tidy container image for production, but it's good proof of concept at this point.
If you find you need a longer delay to allow the page to finish loading, I can help you with a Karakeep hack to add a delay.
@pdc1 commented on GitHub (Aug 2, 2025):
Okay, I have some time so here is the delay hack if you need it 😄. Note that my docker image is installed in
karakeep_appso my Karakeep container is calledkarakeep-app-web-1. Your mileage may vary.docker pslists the running images.Get a copy of the Karakeep compiled JavaScript from the container:
docker cp karakeep-app-web-1:/app/apps/workers/dist/index.mjs index-orig.mjsCopy the
index-orig.mjstoindex.mjsand edit in your favorite text editor.Look for the phrase
Waiting for the page to load ...(it's a large file so that might take a while)Add the lines starting with
+(do not include the+)That is 5 seconds, but you can extend as needed. You could even give enough time to interactively solve a captcha if there's a page you really want to capture.
Copy the
index.mjsfile back to the Karakeep container:docker cp index.mjs karakeep-app-web-1:/app/apps/workers/dist/index.mjsRestart the Karakeep container
docker restart karakeep-app-web-1And watch the logs:
docker logs -f -n 20 karakeep-app-web-1. You can hit ^C to stop when you're done, it won't affect your Karakeep server, it's just stopping the log command.I think that should be it, happy hacking!
@CrazyWolf13 commented on GitHub (Aug 2, 2025):
@MohamedBassem seeing that there is a great demand for this and already some good looking hacked together concepts, is there any way we can get something officially supported, or a preferred way from your side for this to be implemented?
@MohamedBassem commented on GitHub (Aug 2, 2025):
There are indeed a lot of good ideas here. I'll evaluate the different options and see what we can do. The brave idea is smart and didn't come to mind for example. The waiting interval until ublock kicks in is something that I remember experimenting with before but didn't work, will give it a try one more time now that we're on playwright. Using persistent contexts I was trying to avoid as it can have security implications but I might maybe be able to consider a one context per user approach or something. Will go through the comments and report back. Please keep the good ideas coming!
@ewanly commented on GitHub (Oct 18, 2025):
I hope this will be resolved soon. I just started using Karakeep, and I didn't like how most of my links are presented.
@kassyss commented on GitHub (Oct 18, 2025):
Hi @ewanly
In the meantime you can use Singlefile extension (available for almost all browsers as well as an iOS safari).
It Works pretty well but i agree, i would prefer to only rely on Karakeep app and extension.
Best regards
@maltokyo commented on GitHub (Nov 8, 2025):
How do I do that @kassyss ? Thanks!
@kassyss commented on GitHub (Nov 8, 2025):
Hi @maltokyo, you just need to install singlefile extension (Firefox, chrome, iOS…) and read the documentation
@maltokyo commented on GitHub (Nov 8, 2025):
Done, thank you!
@alexbelgium commented on GitHub (Jan 16, 2026):
@pdc1 hi, how do you connect your brave browser to Karakeep? thanks
@kafmees commented on GitHub (Jan 16, 2026):
I'm not sure what you mean. You can use https://chromewebstore.google.com/detail/karakeep/kgcjekpmcjjogibpjebkhaanilehneje?pli=1 as an add-on. But I use the android app and share kinks with it.
@alexbelgium commented on GitHub (Jan 16, 2026):
Hi, thanks. I meant that pdc1 was using a brave browser to circumvent cookies and gdpr. I also installed a brave based docker but can’t find how to connect it with the BROWSER_WEB_URL env variable
Edit : great extension btw
@pdc1 commented on GitHub (Jan 17, 2026):
There's a section above in this thread (https://github.com/karakeep-app/karakeep/issues/414#issuecomment-3146537943) where I document the changes. Since brave is based on chromium it can be swapped with chromium, but requires some hacking of the docker file. It also required some changes to the core karakeep code, so it's more of a proof of concept than a configuration change.
@Sleywill commented on GitHub (Feb 23, 2026):
Adding another option for people who find cookie banners a recurring issue: if you're using an external screenshot API, most of the good ones handle this at the API level.
SnapAPI for example has a
blockCookieBanners: trueparam that uses a CSS selector blocklist to hide consent dialogs before capture — so you don't have to build and maintain your own banner detection logic:Not sure if karakeep is moving toward an API-based approach, but might be useful context for the discussion.