[GH-ISSUE #1334] Bug: yt-dlp fails to download when media filename is too long or contains special characters #3835

Closed
opened 2026-03-15 00:38:20 +03:00 by kerem · 1 comment
Owner

Originally created by @Giger22 on GitHub (Jan 22, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1334

Describe the bug

Cannot download media from a nitter. Probably happens when a file name is too long. Easly solved when using a yt-dlp by changing an output file name (-o "name.mp4") but how do I prevent this from happening in archivebox?

Steps to reproduce

For example:

  1. Add nitter link (https://nitter.mint.lgbt/AdriRM33/status/1749421742664163619).
  2. Select media.
  3. Failed to save media.
  4. Links with short output file name works (https://nitter.net/AdriRM33/status/1749388415143936060).
    I use yt-dlp.

Screenshots or log output

Failed to save media

ArchiveBox version

0.7.2
ArchiveBox v0.7.2 BUILD_TIME=2024-01-22 14:18:50 1705929530
IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.9-artix1-1-x86_64-with-glibc2.38 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.6         valid     /usr/bin/python3.11                                                         
 √  SQLITE_BINARY         v2.6.0          valid     /usr/lib/python3.11/sqlite3/dbapi2.py                                       
 √  DJANGO_BINARY         v3.1.14         valid     /opt/archivebox/lib/python3.11/site-packages/django/__init__.py             
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /opt/archivebox/bin/archivebox                                              

 √  CURL_BINARY           v8.5.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.4         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v21.6.0         valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.47         valid     /usr/lib/node_modules/single-file-cli/single-file                           
 √  READABILITY_BINARY    v0.0.11         valid     /usr/lib/node_modules/readability-extractor/readability-extractor           
 √  MERCURY_BINARY        v1.0.0          valid     /usr/lib/node_modules/@postlight/parser/cli.js                              
 √  GIT_BINARY            v2.43.0         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/bin/yt-dlp                                                             
 √  CHROME_BINARY         v120.0.6099.224  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v14.1.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /opt/archivebox/lib/python3.11/site-packages/archivebox                     
 √  TEMPLATES_DIR         3 files         valid     /opt/archivebox/lib/python3.11/site-packages/archivebox/templates           
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            6 files         valid     /home/artix/Applications/archiveboxAUR                                      
 √  SOURCES_DIR           4 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           3 files         valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             220.0 KB        valid     ./index.sqlite3          
Originally created by @Giger22 on GitHub (Jan 22, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1334 #### Describe the bug Cannot download media from a nitter. Probably happens when a file name is too long. Easly solved when using a yt-dlp by changing an output file name (-o "name.mp4") but how do I prevent this from happening in archivebox? #### Steps to reproduce For example: 1. Add nitter link (https://nitter.mint.lgbt/AdriRM33/status/1749421742664163619). 2. Select media. 3. Failed to save media. 4. Links with short output file name works (https://nitter.net/AdriRM33/status/1749388415143936060). I use yt-dlp. #### Screenshots or log output Failed to save media #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs 0.7.2 ArchiveBox v0.7.2 BUILD_TIME=2024-01-22 14:18:50 1705929530 IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.6.9-artix1-1-x86_64-with-glibc2.38 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.6 valid /usr/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /opt/archivebox/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /opt/archivebox/bin/archivebox √ CURL_BINARY v8.5.0 valid /usr/bin/curl √ WGET_BINARY v1.21.4 valid /usr/bin/wget √ NODE_BINARY v21.6.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.47 valid /usr/lib/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /usr/lib/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /usr/lib/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.43.0 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/bin/yt-dlp √ CHROME_BINARY v120.0.6099.224 valid /usr/bin/chromium √ RIPGREP_BINARY v14.1.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /opt/archivebox/lib/python3.11/site-packages/archivebox √ TEMPLATES_DIR 3 files valid /opt/archivebox/lib/python3.11/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 6 files valid /home/artix/Applications/archiveboxAUR √ SOURCES_DIR 4 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 3 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 220.0 KB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
Author
Owner

@pirate commented on GitHub (Jan 23, 2024):

I just pushed a fix to add the --restrict-filenames and --trim-filenames option to the default args: 3b36928bdc

You can wait for it to be released in v0.7.3, or run :dev to get the fix now: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

In the future if you want to customize yt-dlp args you can configure the YOUTUBEDL_ARGS archivebox option like so:

archivebox config --get YOUTUBEDL_ARGS
# get the list of default args

# add some default args, e.g. %(title)s.%(ext)s
archivebox config --set YOUTUBEDL_ARGS='["-o", "%(title)s.%(ext)s, ... default args here ...]'
# archivebox config --set YOUTUBEDL_ARGS='["-o", "%(title)s.%(ext)s, "--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--write-auto-subs", "--convert-subs=srt", "--yes-playlist", "--continue", "--no-abort-on-error", "--ignore-errors", "--geo-bypass", "--add-metadata", "--format=(bv*+ba/b)[filesize<=750m][filesize_approx<=?750m]/(bv*+ba/b)"
<!-- gh-comment-id:1907022357 --> @pirate commented on GitHub (Jan 23, 2024): I just pushed a fix to add the `--restrict-filenames` and `--trim-filenames` option to the default args: 3b36928bdce143e5ab0462898c83f4fb07e2523a You can wait for it to be released in v0.7.3, or run `:dev` to get the fix now: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch In the future if you want to customize yt-dlp args you can configure the `YOUTUBEDL_ARGS` archivebox option like so: ```bash archivebox config --get YOUTUBEDL_ARGS # get the list of default args # add some default args, e.g. %(title)s.%(ext)s archivebox config --set YOUTUBEDL_ARGS='["-o", "%(title)s.%(ext)s, ... default args here ...]' # archivebox config --set YOUTUBEDL_ARGS='["-o", "%(title)s.%(ext)s, "--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--write-auto-subs", "--convert-subs=srt", "--yes-playlist", "--continue", "--no-abort-on-error", "--ignore-errors", "--geo-bypass", "--add-metadata", "--format=(bv*+ba/b)[filesize<=750m][filesize_approx<=?750m]/(bv*+ba/b)" ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3835
No description provided.