starred/karakeep

Fork 0

mirror of https://github.com/karakeep-app/karakeep.git synced 2026-04-25 16:06:04 +03:00

[GH-ISSUE #1367] YouTube video not downloading - despite set .env variables #876

New issue

Closed

opened 2026-03-02 11:53:25 +03:00 by kerem · 3 comments

kerem commented

2026-03-02 11:53:25 +03:00

Owner

Originally created by @ballerbude on GitHub (May 7, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1367

The following 3 env variable are set in the .env:

CRAWLER_VIDEO_DOWNLOAD=true
CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE=-1
CRAWLER_VIDEO_DOWNLOAD_TIMEOUT_SEC=7200

This is the full container log what happens when the URL is pasted in karakeep and save button is pressed:

web-1          | 2025-05-07T11:50:29.805Z info: [search][29] Attempting to index bookmark with id xg1xac3cliwbxjr32v348spi ...
web-1          | 2025-05-07T11:50:29.845Z info: [Crawler][27] Will crawl "https://www.youtube.com/watch?v=i1HgN7u7w_w" for link with id "xg1xac3cliwbxjr32v348spi"
web-1          | 2025-05-07T11:50:29.845Z info: [Crawler][27] Attempting to determine the content-type for the url https://www.youtube.com/watch?v=i1HgN7u7w_w
meilisearch-1  | 2025-05-07T11:50:29.869689Z  INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=1.97ms time.idle=12.4ms
meilisearch-1  | 2025-05-07T11:50:29.875890Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=166µs time.idle=1.80ms
meilisearch-1  | 2025-05-07T11:50:29.933339Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=148µs time.idle=414µs
meilisearch-1  | 2025-05-07T11:50:30.013462Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=172µs time.idle=829µs
meilisearch-1  | 2025-05-07T11:50:30.071702Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=154µs time.idle=285µs
meilisearch-1  | 2025-05-07T11:50:30.132808Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=185µs time.idle=715µs
meilisearch-1  | 2025-05-07T11:50:30.194398Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=208µs time.idle=1.27ms
web-1          | 2025-05-07T11:50:30.278Z info: [webhook][30] Starting a webhook job for bookmark with id "xg1xac3cliwbxjr32v348spi for operation "created"
web-1          | 2025-05-07T11:50:30.280Z info: [webhook][30] Completed successfully
web-1          | 2025-05-07T11:50:30.345Z info: [ruleEngine][28] Completed successfully
meilisearch-1  | 2025-05-07T11:50:30.368682Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=500µs time.idle=518µs
meilisearch-1  | 2025-05-07T11:50:30.435675Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=909µs time.idle=239µs
meilisearch-1  | 2025-05-07T11:50:30.462496Z  INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 2 } processed_in=600.436154ms
meilisearch-1  | 2025-05-07T11:50:30.472425Z  INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks.
meilisearch-1  | 2025-05-07T11:50:30.499710Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=2.74ms time.idle=84.2µs
web-1          | 2025-05-07T11:50:30.552Z info: [search][29] Completed successfully
web-1          | 2025-05-07T11:50:30.693Z info: [Crawler][27] Content-type for the url https://www.youtube.com/watch?v=i1HgN7u7w_w is "text/html; charset=utf-8"
chrome-1       | [0507/115030.888259:WARNING:runtime_features.cc(728)] AttributionReportingCrossAppWeb cannot be enabled in this configuration. Use --enable-features=ConversionMeasurement,AttributionReportingCrossAppWeb in addition.
chrome-1       | [0507/115032.161403:WARNING:runtime_features.cc(728)] AttributionReportingCrossAppWeb cannot be enabled in this configuration. Use --enable-features=ConversionMeasurement,AttributionReportingCrossAppWeb in addition.
chrome-1       | [0507/115037.142370:WARNING:audio_manager_linux.cc(53)] Falling back to ALSA for audio output. PulseAudio is not available or could not be initialized.
chrome-1       | ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
chrome-1       | ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
chrome-1       | ALSA lib confmisc.c:1342:(snd_func_refer) error evaluating name
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
chrome-1       | ALSA lib conf.c:5727:(snd_config_expand) Evaluate error: No such file or directory
chrome-1       | ALSA lib pcm.c:2675:(snd_pcm_open_noupdate) Unknown PCM default
chrome-1       | [0507/115037.232443:ERROR:alsa_util.cc(204)] PcmOpen: default,No such file or directory
chrome-1       | ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
chrome-1       | ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
chrome-1       | ALSA lib confmisc.c:1342:(snd_func_refer) error evaluating name
chrome-1       | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
chrome-1       | ALSA lib conf.c:5727:(snd_config_expand) Evaluate error: No such file or directory
chrome-1       | ALSA lib pcm.c:2675:(snd_pcm_open_noupdate) Unknown PCM default
chrome-1       | [0507/115037.233018:ERROR:alsa_util.cc(204)] PcmOpen: plug:default,No such file or directory
chrome-1       | [0507/115037.279803:ERROR:web_contents_delegate.cc(260)] WebContentsDelegate::CheckMediaAccessPermission: Not supported.
chrome-1       | [0507/115037.279832:ERROR:web_contents_delegate.cc(260)] WebContentsDelegate::CheckMediaAccessPermission: Not supported.
web-1          | 2025-05-07T11:50:37.485Z info: [Crawler][27] Successfully navigated to "https://www.youtube.com/watch?v=i1HgN7u7w_w". Waiting for the page to load ...
web-1          | 2025-05-07T11:50:39.044Z info: [Crawler][27] Finished waiting for the page to load.
web-1          | 2025-05-07T11:50:39.823Z info: [Crawler][27] Successfully fetched the page content.
web-1          | 2025-05-07T11:50:42.076Z info: [Crawler][27] Finished capturing page content and a screenshot. FullPageScreenshot: true
web-1          | 2025-05-07T11:50:42.091Z info: [Crawler][27] Will attempt to extract metadata from page ...
web-1          | 2025-05-07T11:50:48.922Z info: [Crawler][27] Will attempt to extract readable content ...
web-1          | 2025-05-07T11:50:54.135Z info: [Crawler][27] Done extracting readable content.
web-1          | 2025-05-07T11:50:56.236Z info: [Crawler][27] Stored the screenshot as assetId: d08e1304-e2de-4c3d-92e9-2695ad8a1fdc
web-1          | 2025-05-07T11:50:56.264Z info: [Crawler][27] Done extracting metadata from the page.
web-1          | 2025-05-07T11:50:56.264Z info: [Crawler][27] Downloading image from "https://yt3.ggpht.com/OdKb6lPyN0jP_NAqbSy2CsSOoRk9obh4EleQQEi4PwwmmrP3FGznZuTJ8Qle3On2s-r_9cMe=s48-c-k-c0x00ffffff-no-rj"
web-1          | 2025-05-07T11:50:56.330Z info: [Crawler][27] Downloaded image as assetId: 9cc7f37f-091c-435f-9004-4c75388b8890
web-1          | 2025-05-07T11:50:56.572Z info: [Crawler][27] Will attempt to archive page ...
web-1          | 2025-05-07T11:50:57.524Z info: [webhook][34] Starting a webhook job for bookmark with id "xg1xac3cliwbxjr32v348spi for operation "crawled"
web-1          | 2025-05-07T11:50:57.527Z info: [webhook][34] Completed successfully
web-1          | 2025-05-07T11:50:57.764Z info: [search][32] Attempting to index bookmark with id xg1xac3cliwbxjr32v348spi ...
web-1          | 2025-05-07T11:50:57.807Z debug: [inference][31] No inference client configured, nothing to do now
web-1          | 2025-05-07T11:50:57.807Z info: [inference][31] Completed successfully
web-1          | 2025-05-07T11:50:57.876Z info: [VideoCrawler][33] Attempting to download a file from "https://www.youtube.com/watch?v=i1HgN7u7w_w" to "/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0" using the following arguments: "https://www.youtube.com/watch?v=i1HgN7u7w_w,-o,/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0,--no-playlist"
meilisearch-1  | 2025-05-07T11:50:57.908547Z  INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=700µs time.idle=11.7ms
meilisearch-1  | 2025-05-07T11:50:57.933682Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=1.09ms time.idle=2.78ms
meilisearch-1  | 2025-05-07T11:50:58.012629Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=226µs time.idle=4.81ms
meilisearch-1  | 2025-05-07T11:50:58.126315Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=159µs time.idle=385µs
meilisearch-1  | 2025-05-07T11:50:58.185126Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=348µs time.idle=74.8µs
meilisearch-1  | 2025-05-07T11:50:58.219836Z  INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 2 } processed_in=310.649311ms
meilisearch-1  | 2025-05-07T11:50:58.250859Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=172µs time.idle=2.58ms
meilisearch-1  | 2025-05-07T11:50:58.261674Z  INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks.
meilisearch-1  | 2025-05-07T11:50:58.322197Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=2.68ms time.idle=82.0µs
web-1          | 2025-05-07T11:50:58.360Z info: [search][32] Completed successfully
web-1          | 2025-05-07T11:51:06.378Z info: [Crawler][27] Done archiving the page as assetId: 12c3848d-d89a-43d3-9d15-75eac7ec5909
web-1          | 2025-05-07T11:51:06.503Z info: [Crawler][27] Completed successfully
web-1          | 2025-05-07T11:51:08.321Z error: [VideoCrawler][33] Failed to download a file from "https://www.youtube.com/watch?v=i1HgN7u7w_w" to "/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0"
web-1          | 2025-05-07T11:51:08.361Z info: [VideoCrawler][33] Video Download Completed successfully

It seems to affect all YouTube videos. Am I missing something? Any additional configuration required?

Originally created by @ballerbude on GitHub (May 7, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/1367 The following 3 env variable are set in the .env: ``` CRAWLER_VIDEO_DOWNLOAD=true CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE=-1 CRAWLER_VIDEO_DOWNLOAD_TIMEOUT_SEC=7200 ``` This is the full container log what happens when the URL is pasted in karakeep and save button is pressed: ``` web-1 | 2025-05-07T11:50:29.805Z info: [search][29] Attempting to index bookmark with id xg1xac3cliwbxjr32v348spi ... web-1 | 2025-05-07T11:50:29.845Z info: [Crawler][27] Will crawl "https://www.youtube.com/watch?v=i1HgN7u7w_w" for link with id "xg1xac3cliwbxjr32v348spi" web-1 | 2025-05-07T11:50:29.845Z info: [Crawler][27] Attempting to determine the content-type for the url https://www.youtube.com/watch?v=i1HgN7u7w_w meilisearch-1 | 2025-05-07T11:50:29.869689Z INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=1.97ms time.idle=12.4ms meilisearch-1 | 2025-05-07T11:50:29.875890Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=166µs time.idle=1.80ms meilisearch-1 | 2025-05-07T11:50:29.933339Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=148µs time.idle=414µs meilisearch-1 | 2025-05-07T11:50:30.013462Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=172µs time.idle=829µs meilisearch-1 | 2025-05-07T11:50:30.071702Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=154µs time.idle=285µs meilisearch-1 | 2025-05-07T11:50:30.132808Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=185µs time.idle=715µs meilisearch-1 | 2025-05-07T11:50:30.194398Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=208µs time.idle=1.27ms web-1 | 2025-05-07T11:50:30.278Z info: [webhook][30] Starting a webhook job for bookmark with id "xg1xac3cliwbxjr32v348spi for operation "created" web-1 | 2025-05-07T11:50:30.280Z info: [webhook][30] Completed successfully web-1 | 2025-05-07T11:50:30.345Z info: [ruleEngine][28] Completed successfully meilisearch-1 | 2025-05-07T11:50:30.368682Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=500µs time.idle=518µs meilisearch-1 | 2025-05-07T11:50:30.435675Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=909µs time.idle=239µs meilisearch-1 | 2025-05-07T11:50:30.462496Z INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 2 } processed_in=600.436154ms meilisearch-1 | 2025-05-07T11:50:30.472425Z INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks. meilisearch-1 | 2025-05-07T11:50:30.499710Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/11 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=2.74ms time.idle=84.2µs web-1 | 2025-05-07T11:50:30.552Z info: [search][29] Completed successfully web-1 | 2025-05-07T11:50:30.693Z info: [Crawler][27] Content-type for the url https://www.youtube.com/watch?v=i1HgN7u7w_w is "text/html; charset=utf-8" chrome-1 | [0507/115030.888259:WARNING:runtime_features.cc(728)] AttributionReportingCrossAppWeb cannot be enabled in this configuration. Use --enable-features=ConversionMeasurement,AttributionReportingCrossAppWeb in addition. chrome-1 | [0507/115032.161403:WARNING:runtime_features.cc(728)] AttributionReportingCrossAppWeb cannot be enabled in this configuration. Use --enable-features=ConversionMeasurement,AttributionReportingCrossAppWeb in addition. chrome-1 | [0507/115037.142370:WARNING:audio_manager_linux.cc(53)] Falling back to ALSA for audio output. PulseAudio is not available or could not be initialized. chrome-1 | ALSA lib confmisc.c:855:(parse_card) cannot find card '0' chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory chrome-1 | ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory chrome-1 | ALSA lib confmisc.c:1342:(snd_func_refer) error evaluating name chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory chrome-1 | ALSA lib conf.c:5727:(snd_config_expand) Evaluate error: No such file or directory chrome-1 | ALSA lib pcm.c:2675:(snd_pcm_open_noupdate) Unknown PCM default chrome-1 | [0507/115037.232443:ERROR:alsa_util.cc(204)] PcmOpen: default,No such file or directory chrome-1 | ALSA lib confmisc.c:855:(parse_card) cannot find card '0' chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory chrome-1 | ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory chrome-1 | ALSA lib confmisc.c:1342:(snd_func_refer) error evaluating name chrome-1 | ALSA lib conf.c:5204:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory chrome-1 | ALSA lib conf.c:5727:(snd_config_expand) Evaluate error: No such file or directory chrome-1 | ALSA lib pcm.c:2675:(snd_pcm_open_noupdate) Unknown PCM default chrome-1 | [0507/115037.233018:ERROR:alsa_util.cc(204)] PcmOpen: plug:default,No such file or directory chrome-1 | [0507/115037.279803:ERROR:web_contents_delegate.cc(260)] WebContentsDelegate::CheckMediaAccessPermission: Not supported. chrome-1 | [0507/115037.279832:ERROR:web_contents_delegate.cc(260)] WebContentsDelegate::CheckMediaAccessPermission: Not supported. web-1 | 2025-05-07T11:50:37.485Z info: [Crawler][27] Successfully navigated to "https://www.youtube.com/watch?v=i1HgN7u7w_w". Waiting for the page to load ... web-1 | 2025-05-07T11:50:39.044Z info: [Crawler][27] Finished waiting for the page to load. web-1 | 2025-05-07T11:50:39.823Z info: [Crawler][27] Successfully fetched the page content. web-1 | 2025-05-07T11:50:42.076Z info: [Crawler][27] Finished capturing page content and a screenshot. FullPageScreenshot: true web-1 | 2025-05-07T11:50:42.091Z info: [Crawler][27] Will attempt to extract metadata from page ... web-1 | 2025-05-07T11:50:48.922Z info: [Crawler][27] Will attempt to extract readable content ... web-1 | 2025-05-07T11:50:54.135Z info: [Crawler][27] Done extracting readable content. web-1 | 2025-05-07T11:50:56.236Z info: [Crawler][27] Stored the screenshot as assetId: d08e1304-e2de-4c3d-92e9-2695ad8a1fdc web-1 | 2025-05-07T11:50:56.264Z info: [Crawler][27] Done extracting metadata from the page. web-1 | 2025-05-07T11:50:56.264Z info: [Crawler][27] Downloading image from "https://yt3.ggpht.com/OdKb6lPyN0jP_NAqbSy2CsSOoRk9obh4EleQQEi4PwwmmrP3FGznZuTJ8Qle3On2s-r_9cMe=s48-c-k-c0x00ffffff-no-rj" web-1 | 2025-05-07T11:50:56.330Z info: [Crawler][27] Downloaded image as assetId: 9cc7f37f-091c-435f-9004-4c75388b8890 web-1 | 2025-05-07T11:50:56.572Z info: [Crawler][27] Will attempt to archive page ... web-1 | 2025-05-07T11:50:57.524Z info: [webhook][34] Starting a webhook job for bookmark with id "xg1xac3cliwbxjr32v348spi for operation "crawled" web-1 | 2025-05-07T11:50:57.527Z info: [webhook][34] Completed successfully web-1 | 2025-05-07T11:50:57.764Z info: [search][32] Attempting to index bookmark with id xg1xac3cliwbxjr32v348spi ... web-1 | 2025-05-07T11:50:57.807Z debug: [inference][31] No inference client configured, nothing to do now web-1 | 2025-05-07T11:50:57.807Z info: [inference][31] Completed successfully web-1 | 2025-05-07T11:50:57.876Z info: [VideoCrawler][33] Attempting to download a file from "https://www.youtube.com/watch?v=i1HgN7u7w_w" to "/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0" using the following arguments: "https://www.youtube.com/watch?v=i1HgN7u7w_w,-o,/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0,--no-playlist" meilisearch-1 | 2025-05-07T11:50:57.908547Z INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=700µs time.idle=11.7ms meilisearch-1 | 2025-05-07T11:50:57.933682Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=1.09ms time.idle=2.78ms meilisearch-1 | 2025-05-07T11:50:58.012629Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=226µs time.idle=4.81ms meilisearch-1 | 2025-05-07T11:50:58.126315Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=159µs time.idle=385µs meilisearch-1 | 2025-05-07T11:50:58.185126Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=348µs time.idle=74.8µs meilisearch-1 | 2025-05-07T11:50:58.219836Z INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 2 } processed_in=310.649311ms meilisearch-1 | 2025-05-07T11:50:58.250859Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=172µs time.idle=2.58ms meilisearch-1 | 2025-05-07T11:50:58.261674Z INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks. meilisearch-1 | 2025-05-07T11:50:58.322197Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/12 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=2.68ms time.idle=82.0µs web-1 | 2025-05-07T11:50:58.360Z info: [search][32] Completed successfully web-1 | 2025-05-07T11:51:06.378Z info: [Crawler][27] Done archiving the page as assetId: 12c3848d-d89a-43d3-9d15-75eac7ec5909 web-1 | 2025-05-07T11:51:06.503Z info: [Crawler][27] Completed successfully web-1 | 2025-05-07T11:51:08.321Z error: [VideoCrawler][33] Failed to download a file from "https://www.youtube.com/watch?v=i1HgN7u7w_w" to "/tmp/video_downloads/c9e11679-a038-43ec-b873-472168e612a0" web-1 | 2025-05-07T11:51:08.361Z info: [VideoCrawler][33] Video Download Completed successfully ``` It seems to affect all YouTube videos. Am I missing something? Any additional configuration required? ![Image](https://github.com/user-attachments/assets/36cbfe3e-c963-448a-b288-82e78999374a)

kerem closed this issue

2026-03-02 11:53:26 +03:00

kerem commented

2026-03-02 11:53:26 +03:00

Author

Owner

@vhsdream commented on GitHub (May 7, 2025):

You might need to also add some yt-dl arguments to your config. I was having the same issue and the following helped with some (but not all) of the Youtube videos:

CRAWLER_YTDLP_ARGS="--max-filesize=500M"

Adjust the max filesize to suit your needs of course.

@vhsdream commented on GitHub (May 7, 2025): You might need to also add some yt-dl arguments to your config. I was having the same issue and the following helped with some (but not all) of the Youtube videos: `CRAWLER_YTDLP_ARGS="--max-filesize=500M"` Adjust the max filesize to suit your needs of course.

kerem commented

2026-03-02 11:53:26 +03:00

Author

Owner

@ballerbude commented on GitHub (May 7, 2025):

Thank you for the quick response. Unfortunately, that doesn't help. I see the argument being passed, but the result is still the same:

web-1          | 2025-05-07T12:54:58.185Z info: [VideoCrawler][25] Attempting to download a file from "https://www.youtube.com/watch?v=3G6I46efADo" to "/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc" using the following arguments: "https://www.youtube.com/watch?v=3G6I46efADo,--max-filesize=999M,-o,/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc,--no-playlist"
meilisearch-1  | 2025-05-07T12:54:58.489222Z  INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=5.28ms time.idle=13.9ms
meilisearch-1  | 2025-05-07T12:54:58.524947Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=4.48ms time.idle=296µs
meilisearch-1  | 2025-05-07T12:54:58.647487Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=181µs time.idle=786µs
meilisearch-1  | 2025-05-07T12:54:58.722324Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=180µs time.idle=2.41ms
meilisearch-1  | 2025-05-07T12:54:58.811855Z  INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 1 } processed_in=321.004113ms
meilisearch-1  | 2025-05-07T12:54:58.849277Z  INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks.
meilisearch-1  | 2025-05-07T12:54:58.899883Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=1.94ms time.idle=105µs
meilisearch-1  | 2025-05-07T12:54:59.213341Z  INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=197µs time.idle=9.46ms
web-1          | 2025-05-07T12:54:59.272Z info: [search][24] Completed successfully
web-1          | 2025-05-07T12:55:12.347Z error: [VideoCrawler][25] Failed to download a file from "https://www.youtube.com/watch?v=3G6I46efADo" to "/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc"
web-1          | 2025-05-07T12:55:12.358Z info: [VideoCrawler][25] Video Download Completed successfully
web-1          | 2025-05-07T12:55:14.575Z info: [Crawler][19] Done archiving the page as assetId: 3ce2818f-c978-42f9-b433-77755b8eb736
web-1          | 2025-05-07T12:55:14.711Z info: [Crawler][19] Completed successfully

@ballerbude commented on GitHub (May 7, 2025): Thank you for the quick response. Unfortunately, that doesn't help. I see the argument being passed, but the result is still the same: ``` web-1 | 2025-05-07T12:54:58.185Z info: [VideoCrawler][25] Attempting to download a file from "https://www.youtube.com/watch?v=3G6I46efADo" to "/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc" using the following arguments: "https://www.youtube.com/watch?v=3G6I46efADo,--max-filesize=999M,-o,/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc,--no-playlist" meilisearch-1 | 2025-05-07T12:54:58.489222Z INFO HTTP request{method=POST host="meilisearch:7700" route=/indexes/bookmarks/documents query_parameters=primaryKey=id user_agent=node status_code=202}: meilisearch: close time.busy=5.28ms time.idle=13.9ms meilisearch-1 | 2025-05-07T12:54:58.524947Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=4.48ms time.idle=296µs meilisearch-1 | 2025-05-07T12:54:58.647487Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=181µs time.idle=786µs meilisearch-1 | 2025-05-07T12:54:58.722324Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=180µs time.idle=2.41ms meilisearch-1 | 2025-05-07T12:54:58.811855Z INFO index_scheduler::scheduler::process_index_operation: document indexing done indexing_result=DocumentAdditionResult { indexed_documents: 1, number_of_documents: 1 } processed_in=321.004113ms meilisearch-1 | 2025-05-07T12:54:58.849277Z INFO index_scheduler::scheduler: A batch of tasks was successfully completed with 1 successful tasks and 0 failed tasks. meilisearch-1 | 2025-05-07T12:54:58.899883Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=1.94ms time.idle=105µs meilisearch-1 | 2025-05-07T12:54:59.213341Z INFO HTTP request{method=GET host="meilisearch:7700" route=/tasks/10 query_parameters= user_agent=node status_code=200}: meilisearch: close time.busy=197µs time.idle=9.46ms web-1 | 2025-05-07T12:54:59.272Z info: [search][24] Completed successfully web-1 | 2025-05-07T12:55:12.347Z error: [VideoCrawler][25] Failed to download a file from "https://www.youtube.com/watch?v=3G6I46efADo" to "/tmp/video_downloads/d0be9a31-d0b1-4ec6-b90c-f52db8f9bdcc" web-1 | 2025-05-07T12:55:12.358Z info: [VideoCrawler][25] Video Download Completed successfully web-1 | 2025-05-07T12:55:14.575Z info: [Crawler][19] Done archiving the page as assetId: 3ce2818f-c978-42f9-b433-77755b8eb736 web-1 | 2025-05-07T12:55:14.711Z info: [Crawler][19] Completed successfully ```

kerem commented

2026-03-02 11:53:26 +03:00

Author

Owner

@ballerbude commented on GitHub (May 8, 2025):

Okay, it was previously installed on a Ubuntu 22.04 Server VM, now I used the same docker-compose.yml and the same .env on another VM with Ubuntu 24.04 and YouTube downloads are working. I have no clue why, but I guess it's fixed.

@ballerbude commented on GitHub (May 8, 2025): Okay, it was previously installed on a Ubuntu 22.04 Server VM, now I used the same docker-compose.yml and the same .env on another VM with Ubuntu 24.04 and YouTube downloads are working. I have no clue why, but I guess it's fixed.

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/karakeep#876

No description provided.

Rows
Columns