[GH-ISSUE #1106] Bug: docker-compose is defaulting to youtube-dl and not yt-dlp #2206

Closed
opened 2026-03-01 17:57:16 +03:00 by kerem · 2 comments
Owner

Originally created by @kgsbowtie on GitHub (Feb 22, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1106

Describe the bug

When following the docker-compose installation instructions, ArchiveBox uses youtube-dl to download the media instead of yt-dlp. This causes the media to fail to download.

Steps to reproduce

  1. Install and run ArchiveBox following the docker-compose instructions.
  2. Open the Web UI and archive a YouTube video
    1. Open the Web UI.
    2. Click "ADD "
    3. Paste a YouTube video link into the "URLs (one per line)" field.
      • I used this video: https://www.youtube.com/watch?v=wejXvrQwX3w
    4. Either leave "Archive methods" blank or ensure "media" is selected.
  3. Wait on that page for the output.
  4. Observe the following text somewhere in all the output: media Extractor failed: Failed to save media Got youtube-dl response code: 1.. Also observe that when checking on the archive, no media exists.

Screenshots or log output

Generated from only selecting "media" in the "Archive methods" selection box. The output has been formatted slightly for readability (only adding line breaks).

[+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index 
[*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 
[▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... 
[+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media 
    Got youtube-dl response code: 1. 
    ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. 
    Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s 
[√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 
[+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index 
[*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 
[▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... 
[+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media 
    Got youtube-dl response code: 1. 
    ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. 
    Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s 
[√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 
Raw output
[+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 [+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 

ArchiveBox version

[i] [2023-02-22 21:40:12] ArchiveBox v0.6.2: archivebox shell
    > /data

[!] ArchiveBox should never be run as root!
    For more information, see the security overview documentation:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root
# archivebox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.15.49-linuxkit-aarch64-with-glibc2.28 aarch64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                    
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     /data                                                                       
 √  SOURCES_DIR           1 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           1 files         valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             204.0 KB        valid     ./index.sqlite3                                                             
Originally created by @kgsbowtie on GitHub (Feb 22, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1106 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> When following the `docker-compose` installation instructions, ArchiveBox uses `youtube-dl` to download the media instead of `yt-dlp`. This causes the media to fail to download. #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> 1. Install and run ArchiveBox following the [`docker-compose` instructions](https://github.com/ArchiveBox/ArchiveBox#%EF%B8%8F-easy-setup). 2. Open the Web UI and archive a YouTube video 1. Open the Web UI. 2. Click "ADD ➕" 3. Paste a YouTube video link into the "URLs (one per line)" field. - I used this video: `https://www.youtube.com/watch?v=wejXvrQwX3w` 4. Either leave "Archive methods" blank or ensure "media" is selected. 3. Wait on that page for the output. 4. Observe the following text somewhere in all the output: <code>media Extractor failed: Failed to save media Got *youtube-dl* response code: 1.</code>. Also observe that when checking on the archive, no media exists. #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> Generated from only selecting "media" in the "Archive methods" selection box. The output has been formatted slightly for readability (only adding line breaks). ```logs [+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 [+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 ``` <details> <summary>Raw output</summary> ```logs [+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 [+] [2023-02-22 21:47:44] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1677102464-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-02-22 21:47:44] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-02-22 21:47:44] Starting archiving of 1 snapshots in index... [+] [2023-02-22 21:47:44] "www.youtube.com/watch?v=wejXvrQwX3w" https://www.youtube.com/watch?v=wejXvrQwX3w > ./archive/1677102464.251811 > media Extractor failed: Failed to save media Got youtube-dl response code: 1. ERROR: Unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. Run to see full output: cd /data/archive/1677102464.251811; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://www.youtube.com/watch?v=wejXvrQwX3w 2 files (238.5 KB) in 0:00:06s [√] [2023-02-22 21:47:50] Update of 1 pages complete (6.65 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 ``` </details> #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs [i] [2023-02-22 21:40:12] ArchiveBox v0.6.2: archivebox shell > /data [!] ArchiveBox should never be run as root! For more information, see the security overview documentation: https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root # archivebox version ArchiveBox v0.6.2 Cpython Linux Linux-5.15.49-linuxkit-aarch64-with-glibc2.28 aarch64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v89.0.4389.114 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid /data √ SOURCES_DIR 1 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 1 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 204.0 KB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem closed this issue 2026-03-01 17:57:16 +03:00
Author
Owner

@pirate commented on GitHub (Feb 24, 2023):

Please try running the latest dev branch archivebox:dev (instead of archivebox:latest or archivebox:master, I believe this is already fixed in the upcoming release.

https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

<!-- gh-comment-id:1444472394 --> @pirate commented on GitHub (Feb 24, 2023): Please try running the latest dev branch `archivebox:dev` (instead of `archivebox:latest` or `archivebox:master`, I believe this is already fixed in the upcoming release. https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch
Author
Owner

@kgsbowtie commented on GitHub (Feb 27, 2023):

This worked! Thank you!

For anyone finding this issue and not familiar with docker, I followed the commands linked and then also had to run the following command below to get the web server up and running.

docker run -v $PWD:/data -it -p 8000:8000 archivebox:dev server 0.0.0.0:8000
<!-- gh-comment-id:1445555279 --> @kgsbowtie commented on GitHub (Feb 27, 2023): This worked! Thank you! For anyone finding this issue and not familiar with docker, I followed the commands linked and then also had to run the following command below to get the web server up and running. ``` docker run -v $PWD:/data -it -p 8000:8000 archivebox:dev server 0.0.0.0:8000 ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2206
No description provided.