[GH-ISSUE #2487] Comprehensive support for Bilibili (Videos, Dynamics, and Articles) #1492

Open
opened 2026-03-02 11:57:39 +03:00 by kerem · 4 comments
Owner

Originally created by @CircleCrop on GitHub (Feb 15, 2026).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2487

Describe the feature you'd like

I would like to add comprehensive native support for Bilibili.com, a major video-sharing and community platform. The integration should include:

  • Video Support: Extracting video metadata (title, uploader, description, cover) and potentially supporting specific content scraping.
  • Dynamic Posts (Dynamics): Full support for text-only and image-text dynamic posts.
  • Articles (Columns): Support for long-form articles, including rich text content and cover images.
  • Rendering: Ensuring the frontend correctly renders bilibili.com content structures.

Describe the benefits this would bring to existing Karakeep users

Bilibili is the primary platform for video and creative content in East Asia. Adding native support would allow a large user base to archive their favorite content with accurate metadata and high-fidelity rendering, which aligns with Karakeep's goal of being a "bookmark-everything" app.

Can the goal of this request already be achieved via other means?

Currently, bilibili.com are treated as generic urls. This often results in incomplete metadata or poor content extraction for dynamic posts and articles, where the main content is behind platform-specific structures.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

I found #877, which mentions issues with "bad banner images" for Bilibili. This proposed feature will resolve it.
Also, no active proposal for Bilibili support was found.


I intend to implement this feature myself. I am familiar with the tech stack (TypeScript/Next.js) and plan to:

  1. Follow the AGENTS.md and CONTRIBUTING.md guidelines for development.
  2. Update the scraping logic in the worker module to handle bilibili.com domains.
  3. Add specific metadata extraction and frontend rendering components for Bilibili content types.
  4. AI-assisted coding to ensure alignment with the project's architecture.
Originally created by @CircleCrop on GitHub (Feb 15, 2026). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/2487 ### Describe the feature you'd like I would like to add comprehensive native support for [Bilibili.com](https://www.bilibili.com/), a major video-sharing and community platform. The integration should include: * **Video Support**: Extracting video metadata (title, uploader, description, cover) and potentially supporting specific content scraping. * **Dynamic Posts (Dynamics)**: Full support for text-only and image-text dynamic posts. * **Articles (Columns)**: Support for long-form articles, including rich text content and cover images. * **Rendering**: Ensuring the frontend correctly renders bilibili.com content structures. ### Describe the benefits this would bring to existing Karakeep users Bilibili is the primary platform for video and creative content in East Asia. Adding native support would allow a large user base to archive their favorite content with accurate metadata and high-fidelity rendering, which aligns with Karakeep's goal of being a "bookmark-everything" app. ### Can the goal of this request already be achieved via other means? Currently, bilibili.com are treated as generic urls. This often results in incomplete metadata or poor content extraction for dynamic posts and articles, where the main content is behind platform-specific structures. ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context I found #877, which mentions issues with "bad banner images" for Bilibili. This proposed feature will resolve it. Also, no active proposal for Bilibili support was found. --- I intend to implement this feature myself. I am familiar with the tech stack (TypeScript/Next.js) and plan to: 1. Follow the `AGENTS.md` and `CONTRIBUTING.md` guidelines for development. 2. Update the scraping logic in the worker module to handle `bilibili.com` domains. 3. Add specific metadata extraction and frontend rendering components for Bilibili content types. 4. AI-assisted coding to ensure alignment with the project's architecture.
Author
Owner

@MohamedBassem commented on GitHub (Feb 15, 2026):

@CircleCrop Thanks for opening the issue before going ahead with the implementation. Sounds good to me as a feature request.

This probably means a metascraper plugin similar to that of reddit and a custom renderer (if needed) similar to that of Amazon renderer. Please in the PR include some screenshots of the renderer and metadata extraction. Thanks you!

<!-- gh-comment-id:3904249736 --> @MohamedBassem commented on GitHub (Feb 15, 2026): @CircleCrop Thanks for opening the issue before going ahead with the implementation. Sounds good to me as a feature request. This probably means a metascraper plugin similar to that of reddit and a custom renderer (if needed) similar to that of Amazon renderer. Please in the PR include some screenshots of the renderer and metadata extraction. Thanks you!
Author
Owner

@CircleCrop commented on GitHub (Feb 26, 2026):

Hi @MohamedBassem, thanks for the guidance.

I’m currently reviewing my implementation and want to confirm two best-practice details:

  1. For image persistence, should I download all images from the page, or only the cover image?
  2. Some websites use strict anti-bot protections. Is there any plan for Karakeep to support persisting intermediate fetch context (for example cookies, tokens, or request metadata) for more reliable long-term scraping?

I want to make sure this PR aligns with project conventions. Thanks!

<!-- gh-comment-id:3964569397 --> @CircleCrop commented on GitHub (Feb 26, 2026): Hi @MohamedBassem, thanks for the guidance. I’m currently reviewing my implementation and want to confirm two best-practice details: 1. For image persistence, should I download all images from the page, or only the cover image? 2. Some websites use strict anti-bot protections. Is there any plan for Karakeep to support persisting intermediate fetch context (for example cookies, tokens, or request metadata) for more reliable long-term scraping? I want to make sure this PR aligns with project conventions. Thanks!
Author
Owner

@MohamedBassem commented on GitHub (Feb 26, 2026):

  1. We currently only download the cover image
  2. Yeah, that's planned but not there yet :)
<!-- gh-comment-id:3964574381 --> @MohamedBassem commented on GitHub (Feb 26, 2026): 2. We currently only download the cover image 3. Yeah, that's planned but not there yet :)
Author
Owner

@CircleCrop commented on GitHub (Feb 26, 2026):

Thanks for confirming!

<!-- gh-comment-id:3964589701 --> @CircleCrop commented on GitHub (Feb 26, 2026): Thanks for confirming!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1492
No description provided.