[GH-ISSUE #307] Type validation with pydantic

kerem commented

2026-02-27 22:08:46 +03:00

Owner

Originally created by @sigma67 on GitHub (Oct 5, 2022).
Original GitHub issue: https://github.com/sigma67/ytmusicapi/issues/307

Something I've been meaning to implement in the long term is proper models and type validation for the responses from YouTube Music. This would provide better data validation and would notify sooner if something on the API side has changed, as opposed to the current tests which usually check for the length of the results in the response only.

pydantic seems to be the gold standard for this. I'd like to wait for the release of v2.0 to not have to deal with breaking changes from v1 once it comes out.

Please thumbs up if you're in favor and leave a comment if you're willing to help or contribute 🥳

Originally created by @sigma67 on GitHub (Oct 5, 2022). Original GitHub issue: https://github.com/sigma67/ytmusicapi/issues/307 Something I've been meaning to implement in the long term is proper models and type validation for the responses from YouTube Music. This would provide better data validation and would notify sooner if something on the API side has changed, as opposed to the current tests which usually check for the length of the results in the response only. [pydantic](https://github.com/pydantic/pydantic) seems to be the gold standard for this. I'd like to wait for the release of [v2.0](https://pydantic-docs.helpmanual.io/blog/pydantic-v2/) to not have to deal with breaking changes from v1 once it comes out. Please thumbs up if you're in favor and leave a comment if you're willing to help or contribute 🥳

kerem added the

enhancement

help wanted

labels

2026-02-27 22:08:46 +03:00

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@sigma67 commented on GitHub (Jan 4, 2023):

Pydantic v2 ETA is now Q1 2023, so we can start work on this Q2 at the earliest

@sigma67 commented on GitHub (Jan 4, 2023): Pydantic v2 ETA is now Q1 2023, so we can start work on this Q2 at the earliest

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@ambujpawar commented on GitHub (Apr 20, 2023):

I am willing to help :)

@ambujpawar commented on GitHub (Apr 20, 2023): I am willing to help :)

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@sigma67 commented on GitHub (Apr 20, 2023):

Pydantic v2 is now in alpha: https://docs.pydantic.dev/blog/pydantic-v2-alpha/

I think we should use this issue as a starting point to create additional subtasks using GitHub tasklists: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists

As this is a major endeavor with breaking changes development should take place on a v2 branch for now.

@sigma67 commented on GitHub (Apr 20, 2023): Pydantic v2 is now in alpha: https://docs.pydantic.dev/blog/pydantic-v2-alpha/ I think we should use this issue as a starting point to create additional subtasks using GitHub tasklists: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists As this is a major endeavor with breaking changes development should take place on a v2 branch for now.

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@JohnHKoh commented on GitHub (Jun 10, 2023):

Started playing around with this, here is a rudimentary way to provide type validation.

I created a new module called where I started to define some types. The Playlist type, for example, looks like this:

class Playlist(BaseModel):
    title: str
    playlistId: str
    thumbnails: list[Thumbnail]
    description: str
    count: Optional[str] = None
    author: Optional[list[Author]] = None

Then, inside "parsers/browsing.py", we can create a new instance of this Playlist class with the information we parse from parse_playlist:

def parse_playlist(data):
    playlist = {
        'title': nav(data, TITLE_TEXT),
        'playlistId': nav(data, TITLE + NAVIGATION_BROWSE_ID)[2:],
        'thumbnails': nav(data, THUMBNAIL_RENDERER)
    }
    subtitle = data['subtitle']
    if 'runs' in subtitle:
        playlist['description'] = "".join([run['text'] for run in subtitle['runs']])
        if len(subtitle['runs']) == 3 and re.search(r'\d+ ', nav(data, SUBTITLE2)):
            playlist['count'] = nav(data, SUBTITLE2).split(' ')[0]
            playlist['author'] = parse_song_artists_runs(subtitle['runs'][:1])

    return Playlist(**playlist) # Return playlist instance

(Note, this could probably be done at a "lower level" without having to create the intermediary playlist dictionary first)

Then, we can add some type hinting in "mixins/library.py"...

    def get_library_playlists(self, limit: int = 25) -> list[Playlist]:

And now we get some more helpful type hinting with the new object.

The data validation works too. For example, in the Playlist class, I specify count as optional, because I believe empty playlists do not have the count field. If I remove the optional type and make it required, I get a validation error for a playlist that does NOT have the count field:

  Field required [type=missing, input_value={'title': 'Episodes for L...des you save for later'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/0.38.0/v/missing

I can continue to help with this work - I think it can be very helpful for maintaining the API as well as improving usability with the type hints. Here's a quick branch comparison between the testing I did and the current master branch:

https://github.com/sigma67/ytmusicapi/compare/master...JohnHKoh:ytmusicapi:v2-pydantic-test?expand=1

@JohnHKoh commented on GitHub (Jun 10, 2023): Started playing around with this, here is a rudimentary way to provide type validation. I created a new module called where I started to define some types. The `Playlist` type, for example, looks like this: ``` class Playlist(BaseModel): title: str playlistId: str thumbnails: list[Thumbnail] description: str count: Optional[str] = None author: Optional[list[Author]] = None ``` Then, inside "parsers/browsing.py", we can create a new instance of this `Playlist` class with the information we parse from `parse_playlist`: ``` def parse_playlist(data): playlist = { 'title': nav(data, TITLE_TEXT), 'playlistId': nav(data, TITLE + NAVIGATION_BROWSE_ID)[2:], 'thumbnails': nav(data, THUMBNAIL_RENDERER) } subtitle = data['subtitle'] if 'runs' in subtitle: playlist['description'] = "".join([run['text'] for run in subtitle['runs']]) if len(subtitle['runs']) == 3 and re.search(r'\d+ ', nav(data, SUBTITLE2)): playlist['count'] = nav(data, SUBTITLE2).split(' ')[0] playlist['author'] = parse_song_artists_runs(subtitle['runs'][:1]) return Playlist(**playlist) # Return playlist instance ``` (Note, this could probably be done at a "lower level" without having to create the intermediary `playlist` dictionary first) Then, we can add some type hinting in "mixins/library.py"... ``` def get_library_playlists(self, limit: int = 25) -> list[Playlist]: ``` And now we get some more helpful type hinting with the new object. ![image](https://github.com/sigma67/ytmusicapi/assets/6465531/ca66eab4-701a-4bbf-8911-db072bba04d2) The data validation works too. For example, in the `Playlist` class, I specify `count` as optional, because I believe empty playlists do not have the `count` field. If I remove the optional type and make it required, I get a validation error for a playlist that does NOT have the `count` field: ``` Field required [type=missing, input_value={'title': 'Episodes for L...des you save for later'}, input_type=dict] For further information visit https://errors.pydantic.dev/0.38.0/v/missing ``` I can continue to help with this work - I think it can be very helpful for maintaining the API as well as improving usability with the type hints. Here's a quick branch comparison between the testing I did and the current master branch: https://github.com/sigma67/ytmusicapi/compare/master...JohnHKoh:ytmusicapi:v2-pydantic-test?expand=1

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@sigma67 commented on GitHub (Jun 10, 2023):

Looks like a good start! I like that it's not very invasive to the current code base, at least not in this case.

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

@sigma67 commented on GitHub (Jun 10, 2023): Looks like a good start! I like that it's not very invasive to the current code base, at least not in this case. I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function. Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies. I also noticed that you used `list` as opposed to `List`, we would need to increase `requires_python>=3.9` for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@JohnHKoh commented on GitHub (Jun 10, 2023):

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I was using v2 beta in a venv, but yes, it will need to be added. I was just playing around with integrating it into the code, so nothing I wrote was really meant to be final! Just a start. The way we "convert" the raw data into the new types is definitely an implementation detail that could warrant more discussion. Should it use some kind of constructor? Or some other generic parsing function?

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

I also support the push list, as it seems like the List from the typing module may be deprecated some time after October 2025.

@JohnHKoh commented on GitHub (Jun 10, 2023): > I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function. Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences. > Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies. I was using v2 beta in a venv, but yes, it will need to be added. I was just playing around with integrating it into the code, so nothing I wrote was really meant to be final! Just a start. The way we "convert" the raw data into the new types is definitely an implementation detail that could warrant more discussion. Should it use some kind of constructor? Or some other generic parsing function? > I also noticed that you used `list` as opposed to `List`, we would need to increase `requires_python>=3.9` for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change. I also support the push `list`, as it seems like the `List` from the `typing` module [may be deprecated some time after October 2025](https://peps.python.org/pep-0585/).

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@sigma67 commented on GitHub (Jun 25, 2023):

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

I don't think we need to many different samples, just need to ensure that the edge cases are covered (like unavailable songs, which have some data fields, but not others). Instead of dumping we could also just go ahead and try with all fields optional and default None. That would already be an improvement over the current case where sometimes the keys are missing, making the library a bit cumbersome to use.

Should it use some kind of constructor? Or some other generic parsing function?

Definitely just use the default constructor

I also support the push list

Then let's go ahead with >=3.9.

@sigma67 commented on GitHub (Jun 25, 2023): > Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences. I don't think we need to many different samples, just need to ensure that the edge cases are covered (like unavailable songs, which have some data fields, but not others). Instead of dumping we could also just go ahead and try with all fields optional and default None. That would already be an improvement over the current case where sometimes the keys are missing, making the library a bit cumbersome to use. > Should it use some kind of constructor? Or some other generic parsing function? Definitely just use the default constructor > I also support the push list Then let's go ahead with `>=3.9`.

kerem commented

2026-02-27 22:08:46 +03:00

Author

Owner

@TheOnlyWayUp commented on GitHub (Feb 24, 2024):

Hi, I recently made an API Wrapper that used Pydantic.

https://github.com/TheOnlyWayUp/Wattpad-Py

A tool I used prominently was https://jsontopydantic.com.

I'm going to draft up a PR this week.

@TheOnlyWayUp commented on GitHub (Feb 24, 2024): Hi, I recently made an API Wrapper that used Pydantic. https://github.com/TheOnlyWayUp/Wattpad-Py A tool I used prominently was https://jsontopydantic.com. I'm going to draft up a PR this week.

Rows
Columns

[GH-ISSUE #307] Type validation with pydantic #240