[GH-ISSUE #890] Question: How can I download a Youtube video, and just the video and English subtitles? #552

Closed
opened 2026-03-01 14:44:30 +03:00 by kerem · 9 comments
Owner

Originally created by @aidenmitchell on GitHub (Nov 13, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/890

Apologies for another issue, I don't mean to spam.

Currently I have SAVE_MEDIA = False in my config, but I'm getting lots of .vtt files downloaded, and the archive size gets big.
Is there a way to archive only:

  1. the original video, in 1080p
  2. english subtitles
  3. the .description file

Thanks for your help!

Originally created by @aidenmitchell on GitHub (Nov 13, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/890 Apologies for another issue, I don't mean to spam. Currently I have `SAVE_MEDIA = False` in my config, but I'm getting lots of `.vtt` files downloaded, and the archive size gets big. Is there a way to archive only: 1. the original video, in 1080p 2. english subtitles 3. the `.description` file Thanks for your help!
Author
Owner

@pirate commented on GitHub (Nov 13, 2021):

You can add arbitrary YoutubeDL arguments using archivebox config --set YOUTUBEDL_ARGS="...", just figure out what you need and modify the existing args from archivebox/config.py:YOUTUBEDL_ARGS and pass them into that config setting.

e.g. keeping all the default args and adding --sub-lang="en" looks like this:

archivebox config --set YOUTUBEDL_ARGS="--write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=500m --sub-lang=en"
<!-- gh-comment-id:967755086 --> @pirate commented on GitHub (Nov 13, 2021): You can add arbitrary YoutubeDL arguments using `archivebox config --set YOUTUBEDL_ARGS="..."`, just figure out what you need and modify the existing args from `archivebox/config.py:YOUTUBEDL_ARGS` and pass them into that config setting. e.g. keeping all the default args and adding `--sub-lang="en"` looks like this: ```bash archivebox config --set YOUTUBEDL_ARGS="--write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=500m --sub-lang=en" ```
Author
Owner

@aidenmitchell commented on GitHub (Nov 13, 2021):

Thanks, but it's not too happy about that unfortunately.

archivebox config --set YOUTUBEDL_ARGS="--write-sub"
[i] [2021-11-13 01:37:10] ArchiveBox v0.6.3: archivebox config --set YOUTUBEDL_ARGS=--write-sub
    > /data


[X] Error while loading configuration value: YOUTUBEDL_ARGS
    JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    Check your config for mistakes and try again (your archive data is unaffected).

    For config documentation and examples see:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration

I'm probably doing something wrong, however I also copy-pasted your example command and it threw the same error.

<!-- gh-comment-id:967757034 --> @aidenmitchell commented on GitHub (Nov 13, 2021): Thanks, but it's not too happy about that unfortunately. ```logs archivebox config --set YOUTUBEDL_ARGS="--write-sub" [i] [2021-11-13 01:37:10] ArchiveBox v0.6.3: archivebox config --set YOUTUBEDL_ARGS=--write-sub > /data [X] Error while loading configuration value: YOUTUBEDL_ARGS JSONDecodeError: Expecting value: line 1 column 1 (char 0) Check your config for mistakes and try again (your archive data is unaffected). For config documentation and examples see: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration ``` I'm probably doing something wrong, however I also copy-pasted your example command and it threw the same error.
Author
Owner

@ghost commented on GitHub (Nov 16, 2021):

It's a JSON list, not a string.

<!-- gh-comment-id:969848586 --> @ghost commented on GitHub (Nov 16, 2021): It's a JSON list, not a string.
Author
Owner

@ghost commented on GitHub (Nov 16, 2021):

@aidenmitchell Try this instead:

archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]'
<!-- gh-comment-id:969853049 --> @ghost commented on GitHub (Nov 16, 2021): @aidenmitchell Try this instead: ``` archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]' ```
Author
Owner

@pirate commented on GitHub (Nov 16, 2021):

Right, sorry I forgot it needs to be a JSON list!

<!-- gh-comment-id:970602536 --> @pirate commented on GitHub (Nov 16, 2021): Right, sorry I forgot it needs to be a JSON list!
Author
Owner

@aidenmitchell commented on GitHub (Nov 16, 2021):

@remyabel Thank you! Works now.

<!-- gh-comment-id:970622630 --> @aidenmitchell commented on GitHub (Nov 16, 2021): @remyabel Thank you! Works now.
Author
Owner

@dohlin commented on GitHub (Feb 28, 2023):

@aidenmitchell Try this instead:

archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]'

I'm running this via copy & paste but still hitting the following error:

docker-compose run archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]'
Creating archivebox_archivebox_run ... done
[i] [2023-02-28 15:18:17] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-description, --write-info-json, --write-annotations, --write-thumbnail, --no-call-home, --write-sub, --all-subs, --write-auto-sub, --convert-subs=srt, --yes-playlist, --continue, --ignore-errors, --geo-bypass, --add-metadata, --max-filesize=500m, --sub-lang=en]
    > /data

usage: archivebox config [-h] [--get | --set | --reset] [config_options ...]
archivebox config: error: unrecognized arguments: --write-info-json, --write-annotations, --write-thumbnail, --no-call-home, --write-sub, --all-subs, --write-auto-sub, --convert-subs=srt, --yes-playlist, --continue, --ignore-errors, --geo-bypass, --add-metadata, --max-filesize=500m, --sub-lang=en]
ERROR: 2

If I remove the spaces in between each of the parameters, the 'unrecognized arguments' error goes away but instead I get this:

docker-compose run archivebox config --set YOUTUBEDL_ARGS='["--write-description","--write-info-json","--write-annotations","--write-thumbnail","--no-call-home","--write-sub","--all-subs","--write-auto-sub","--convert-subs=srt","--yes-playlist","--continue","--ignore-errors","--geo-bypass","--add-metadata","--max-filesize=500m","--sub-lang=en"]'

Creating archivebox_archivebox_run ... done

[i] [2023-02-28 15:29:45] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-description,--write-info-json,--write-annotations,--write-thumbnail,--no-call-home,--write-sub,--all-subs,--write-auto-sub,--convert-subs=srt,--yes-playlist,--continue,--ignore-errors,--geo-bypass,--add-metadata,--max-filesize=500m,--sub-lang=en]
    > /data


[X] Error while loading configuration value: YOUTUBEDL_ARGS
    JSONDecodeError: Expecting value: line 1 column 2 (char 1)

    Check your config for mistakes and try again (your archive data is unaffected).

    For config documentation and examples see:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration

ERROR: 2

Any thoughts on what I'm missing?

<!-- gh-comment-id:1448370011 --> @dohlin commented on GitHub (Feb 28, 2023): > @aidenmitchell Try this instead: > > ``` > archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]' > ``` I'm running this via copy & paste but still hitting the following error: ``` docker-compose run archivebox config --set YOUTUBEDL_ARGS='["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]' Creating archivebox_archivebox_run ... done [i] [2023-02-28 15:18:17] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-description, --write-info-json, --write-annotations, --write-thumbnail, --no-call-home, --write-sub, --all-subs, --write-auto-sub, --convert-subs=srt, --yes-playlist, --continue, --ignore-errors, --geo-bypass, --add-metadata, --max-filesize=500m, --sub-lang=en] > /data usage: archivebox config [-h] [--get | --set | --reset] [config_options ...] archivebox config: error: unrecognized arguments: --write-info-json, --write-annotations, --write-thumbnail, --no-call-home, --write-sub, --all-subs, --write-auto-sub, --convert-subs=srt, --yes-playlist, --continue, --ignore-errors, --geo-bypass, --add-metadata, --max-filesize=500m, --sub-lang=en] ERROR: 2 ``` If I remove the spaces in between each of the parameters, the 'unrecognized arguments' error goes away but instead I get this: ``` docker-compose run archivebox config --set YOUTUBEDL_ARGS='["--write-description","--write-info-json","--write-annotations","--write-thumbnail","--no-call-home","--write-sub","--all-subs","--write-auto-sub","--convert-subs=srt","--yes-playlist","--continue","--ignore-errors","--geo-bypass","--add-metadata","--max-filesize=500m","--sub-lang=en"]' Creating archivebox_archivebox_run ... done [i] [2023-02-28 15:29:45] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-description,--write-info-json,--write-annotations,--write-thumbnail,--no-call-home,--write-sub,--all-subs,--write-auto-sub,--convert-subs=srt,--yes-playlist,--continue,--ignore-errors,--geo-bypass,--add-metadata,--max-filesize=500m,--sub-lang=en] > /data [X] Error while loading configuration value: YOUTUBEDL_ARGS JSONDecodeError: Expecting value: line 1 column 2 (char 1) Check your config for mistakes and try again (your archive data is unaffected). For config documentation and examples see: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration ERROR: 2 ``` Any thoughts on what I'm missing?
Author
Owner

@pirate commented on GitHub (Feb 28, 2023):

Docker strips the singlequotes off the array list when it passes the command to the container:

[i] [2023-02-28 15:18:17] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-descr ...

So you have to escape them like so @dohlin:

docker-compose run archivebox config --set YOUTUBEDL_ARGS='\'["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]\''

<!-- gh-comment-id:1448628672 --> @pirate commented on GitHub (Feb 28, 2023): Docker strips the singlequotes off the array list when it passes the command to the container: ```logs [i] [2023-02-28 15:18:17] ArchiveBox v0.6.2: archivebox config --set YOUTUBEDL_ARGS=[--write-descr ... ``` So you have to escape them like so @dohlin: ```bash docker-compose run archivebox config --set YOUTUBEDL_ARGS='\'["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]\'' ```
Author
Owner

@dohlin commented on GitHub (Feb 28, 2023):

Ok that didn't work either unfortunately as it just led to the 'continued input' line in Ubuntu (I'm probably using the wrong term there, but basically what happens when you end a command with a slash.

However, based on your info about docker stripping off the singlequotes I figured out the command that works in my case. Here it is in case anyone else stumbles upon this later:

docker-compose run archivebox config --set YOUTUBEDL_ARGS=\''["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]'\'

Basically just escaping the outermost singlequotes. Thank you!

<!-- gh-comment-id:1448861165 --> @dohlin commented on GitHub (Feb 28, 2023): Ok that didn't work either unfortunately as it just led to the 'continued input' line in Ubuntu (I'm probably using the wrong term there, but basically what happens when you end a command with a slash. However, based on your info about docker stripping off the singlequotes I figured out the command that works in my case. Here it is in case anyone else stumbles upon this later: ```bash docker-compose run archivebox config --set YOUTUBEDL_ARGS=\''["--write-description", "--write-info-json", "--write-annotations", "--write-thumbnail", "--no-call-home", "--write-sub", "--all-subs", "--write-auto-sub", "--convert-subs=srt", "--yes-playlist", "--continue", "--ignore-errors", "--geo-bypass", "--add-metadata", "--max-filesize=500m", "--sub-lang=en"]'\' ``` Basically just escaping the _outermost_ singlequotes. Thank you!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#552
No description provided.