[GH-ISSUE #1442] FR: use video subtitles as article content #913

Open
opened 2026-03-02 11:53:42 +03:00 by kerem · 7 comments
Owner

Originally created by @thiswillbeyourgithub on GitHub (May 19, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1442

Describe the feature you'd like

Currently when bookmarking a youtube video, the subtitles are not downloaded. It would be nice if it were and was part of the "article content", this way we could highlight it.

Describe the benefits this would bring to existing Karakeep users

Enhanced bookmark

Can the goal of this request already be achieved via other means?

No

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @thiswillbeyourgithub on GitHub (May 19, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1442 ### Describe the feature you'd like Currently when bookmarking a youtube video, the subtitles are not downloaded. It would be nice if it were and was part of the "article content", this way we could highlight it. ### Describe the benefits this would bring to existing Karakeep users Enhanced bookmark ### Can the goal of this request already be achieved via other means? No ### Have you searched for an existing open/closed issue? - [x] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@Eragos commented on GitHub (May 19, 2025):

Hey!

You can use CRAWLER_YTDLP_ARGS here the Documentation.
In the yt-dlp Documentation you can find the possible subtitles options.

Best Michael

<!-- gh-comment-id:2892304739 --> @Eragos commented on GitHub (May 19, 2025): Hey! You can use `CRAWLER_YTDLP_ARGS` here the [Documentation](https://docs.karakeep.app/configuration#crawler-configs). In the [yt-dlp Documentation](https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#general-options) you can find the possible subtitles options. Best Michael
Author
Owner

@thiswillbeyourgithub commented on GitHub (May 19, 2025):

Thanks but that's still unclear: if karakeep receives a youtube link, it will call yt-dlp and expect a video. But can karakeep also get the subtitles? As they're not a video. If so where would it appear in the UI once it's downloaded?

<!-- gh-comment-id:2892411047 --> @thiswillbeyourgithub commented on GitHub (May 19, 2025): Thanks but that's still unclear: if karakeep receives a youtube link, it will call yt-dlp and expect a video. But can karakeep also get the subtitles? As they're not a video. If so where would it appear in the UI once it's downloaded?
Author
Owner

@beergeekdotcom commented on GitHub (Jun 3, 2025):

It would fit well in the "Reader View" or "Archive" section of the item UI.

Reader View currently has the text scraped from the YT page -- which included partial description (view more.. cut off by default on YT). and a mush of of the comments (no usernames, dates, etc.)

Ideally it would be like the following:

Video Description:

... full video description ...

Video Transcript:

... full video transcript ...

Top Comments:

Username - Date - Comment
Username - Date - Comment
Username - Date - Comment
...

<!-- gh-comment-id:2936282550 --> @beergeekdotcom commented on GitHub (Jun 3, 2025): It would fit well in the "Reader View" or "Archive" section of the item UI. Reader View currently has the text scraped from the YT page -- which included partial description (view more.. cut off by default on YT). and a mush of of the comments (no usernames, dates, etc.) Ideally it would be like the following: Video Description: ... full video description ... Video Transcript: ... full video transcript ... Top Comments: Username - Date - Comment Username - Date - Comment Username - Date - Comment ...
Author
Owner

@thiswillbeyourgithub commented on GitHub (Jun 3, 2025):

That's pretty much what I had in mind yeah! I think "Reader View' would be the only appropriate choice at that's the only place we can highlight stuff.

<!-- gh-comment-id:2936386151 --> @thiswillbeyourgithub commented on GitHub (Jun 3, 2025): That's pretty much what I had in mind yeah! I think "Reader View' would be the only appropriate choice at that's the only place we can highlight stuff.
Author
Owner

@Sacmanxman2 commented on GitHub (Jun 13, 2025):

I tried to download subs using the built in arguments, and it worked but doesn't show up on the UI at all. Doing --embed-subs broke it, so no luck there.

I personally would LOVE this feature. It'd make all the difference for me in downloading videos.

<!-- gh-comment-id:2971182242 --> @Sacmanxman2 commented on GitHub (Jun 13, 2025): I tried to download subs using the built in arguments, and it worked but doesn't show up on the UI at all. Doing `--embed-subs` broke it, so no luck there. I personally would LOVE this feature. It'd make all the difference for me in downloading videos.
Author
Owner

@dimitrieh commented on GitHub (Sep 24, 2025):

Required behaviour i would say, right now the summarisation and tagging isn't based on a lot of data.

current config:

INFERENCE_CONTEXT_LENGTH=16384
CRAWLER_VIDEO_DOWNLOAD=true
CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE=-1
CRAWLER_YTDLP_ARGS=--write-subs%%--write-auto-subs%%--sub-langs%%en%%--sub-format%%srt%%--skip-download%%--write-info-json

results in: the video skipped, the subtitle/transcript + metadata json downloaded (for transcript either auto gen or manual one or both if both are available), summary and tagging or bookmark body does not include this information, youtube dl video malfunctions as no yt video is actually downloaded.

Need to implement transcript inclusion for that that brings this info into the DB instead of leaving.

<!-- gh-comment-id:3328088926 --> @dimitrieh commented on GitHub (Sep 24, 2025): Required behaviour i would say, right now the summarisation and tagging isn't based on a lot of data. current config: ``` INFERENCE_CONTEXT_LENGTH=16384 CRAWLER_VIDEO_DOWNLOAD=true CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE=-1 CRAWLER_YTDLP_ARGS=--write-subs%%--write-auto-subs%%--sub-langs%%en%%--sub-format%%srt%%--skip-download%%--write-info-json ``` results in: the video skipped, the subtitle/transcript + metadata json downloaded (for transcript either auto gen or manual one or both if both are available), summary and tagging or bookmark body does not include this information, youtube dl video malfunctions as no yt video is actually downloaded. Need to implement transcript inclusion for that that brings this info into the DB instead of leaving.
Author
Owner

@dimitrieh commented on GitHub (Sep 24, 2025):

we should close https://github.com/karakeep-app/karakeep/issues/1629 in favor of this one

<!-- gh-comment-id:3328091474 --> @dimitrieh commented on GitHub (Sep 24, 2025): we should close https://github.com/karakeep-app/karakeep/issues/1629 in favor of this one
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#913
No description provided.