[GH-ISSUE #52] YTMusic responses are unreliable for get_library_songs and get_playlist #39

Closed
opened 2026-02-27 22:07:41 +03:00 by kerem · 15 comments
Owner

Originally created by @czifumasa on GitHub (Aug 1, 2020).
Original GitHub issue: https://github.com/sigma67/ytmusicapi/issues/52

In my project I am using ytmusicapi to fetch full content of the user's library and save it in csv file. Then I can use these csv files to compare changes in my library or find songs removed from youtube etc.

Unfortunately currently it's very unreliable.
For example: In my library currently I have 2040 songs. To get the library songs I call the api with high limit:

api = YTMusic('headers_auth.json')
library_songs = api.get_library_songs(50000)

Everytime I send that request, the number of returned songs is different, it varies between 1800-2035 songs.
I know that the problem is in YTM itself, because I observed the same problem on the web client and it hasn't been fixed on their side for months. YTM should return library songs in chunks containing 25 items, but very often it's less than 25.
In the end, on average, at least 10% of my library is missing, making my scripts kinda useless. The same problem occurs for get_playlist method.

Originally created by @czifumasa on GitHub (Aug 1, 2020). Original GitHub issue: https://github.com/sigma67/ytmusicapi/issues/52 In my project I am using ytmusicapi to fetch full content of the user's library and save it in csv file. Then I can use these csv files to compare changes in my library or find songs removed from youtube etc. Unfortunately currently it's very unreliable. For example: In my library currently I have 2040 songs. To get the library songs I call the api with high limit: ``` api = YTMusic('headers_auth.json') library_songs = api.get_library_songs(50000) ``` Everytime I send that request, the number of returned songs is different, it varies between 1800-2035 songs. I know that the problem is in YTM itself, because I observed the same problem on the web client and it hasn't been fixed on their side for months. YTM should return library songs in chunks containing 25 items, but very often it's less than 25. In the end, on average, at least 10% of my library is missing, making my scripts kinda useless. The same problem occurs for `get_playlist` method.
kerem 2026-02-27 22:07:41 +03:00
  • closed this issue
  • added the
    yt-update
    label
Author
Owner

@sigma67 commented on GitHub (Aug 3, 2020):

I've noticed this issue as well since tests were failing randomly. I attempted to fix it in 90bc753. It might not be a fix if the API result skips songs randomly, in that case those would be missing from the response. In your experience, do invalid responses return songs in the correct order without skips? Or are songs randomly missing in between?

To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53.

<!-- gh-comment-id:667939099 --> @sigma67 commented on GitHub (Aug 3, 2020): I've noticed this issue as well since tests were failing randomly. I attempted to fix it in 90bc753. It might not be a fix if the API result skips songs randomly, in that case those would be missing from the response. In your experience, do invalid responses return songs in the correct order without skips? Or are songs randomly missing in between? To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53.
Author
Owner

@czifumasa commented on GitHub (Aug 3, 2020):

Yes, from my observation, API skips songs randomly and fix from 90bc7538e4 is not enough.

To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53.

That's completely fine for me. I know that "retry" solution is not very elegant, but unfortunately the problem exists since I moved from GPM so it's been at least a few months already. I kinda lost my patience and decided to workaround it with my PR. Although I agree with you, that proper fix should be on Youtube's server, so let's give them one more month.

<!-- gh-comment-id:668173816 --> @czifumasa commented on GitHub (Aug 3, 2020): Yes, from my observation, API skips songs randomly and fix from 90bc7538e48105d344ef9c197c86f67f9e13123e is not enough. > To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53. That's completely fine for me. I know that "retry" solution is not very elegant, but unfortunately the problem exists since I moved from GPM so it's been at least a few months already. I kinda lost my patience and decided to workaround it with my PR. Although I agree with you, that proper fix should be on Youtube's server, so let's give them one more month.
Author
Owner

@akraus53 commented on GitHub (Aug 19, 2020):

I think this is happening to getHistory() as well!

<!-- gh-comment-id:675975423 --> @akraus53 commented on GitHub (Aug 19, 2020): I think this is happening to getHistory() as well!
Author
Owner

@xplorr commented on GitHub (Aug 19, 2020):

I use the ytmusic.get_library_upload_songs(50000) call and did not notice any problems so far. I have about 23000 songs in my library and they are all returned except 2. Have to figure out why 2 are missing.

<!-- gh-comment-id:676038881 --> @xplorr commented on GitHub (Aug 19, 2020): I use the ytmusic.get_library_upload_songs(50000) call and did not notice any problems so far. I have about 23000 songs in my library and they are all returned except 2. Have to figure out why 2 are missing.
Author
Owner

@sigma67 commented on GitHub (Aug 25, 2020):

It's been almost a month with no updates from YouTube's side. I suggest we merge this PR, however I want to request two changes if possible.

  1. the PR needs to be rebased on top of current master
  2. I'd like to make the retry behavior optional

The reasoning for 2) is that the changes from this PR doubled the average execution time for me (based on test_get_library_songs - previously 3-4s, now 7-8s). I suggest we introduce an optional parameter validate_responses=False for get_library_songs. If False, the current faulty behavior should occur by calling get_continuations. If True, get_validated_continuations should be used.

The default should be False imo, since the objective of the API is to replicate the web client as closely as possible, which also exhibits this odd behavior. Therefore, it would be an optional feature of ytmusicapi, which validates responses for the user to ensure the response is correct. What do you think?

<!-- gh-comment-id:679923553 --> @sigma67 commented on GitHub (Aug 25, 2020): It's been almost a month with no updates from YouTube's side. I suggest we merge this PR, however I want to request two changes if possible. 1. the PR needs to be rebased on top of current master 2. I'd like to make the retry behavior optional The reasoning for 2) is that the changes from this PR doubled the average execution time for me (based on `test_get_library_songs `- previously 3-4s, now 7-8s). I suggest we introduce an optional parameter `validate_responses=False` for get_library_songs. If `False`, the current faulty behavior should occur by calling `get_continuations.` If True, `get_validated_continuations` should be used. The default should be False imo, since the objective of the API is to replicate the web client as closely as possible, which also exhibits this odd behavior. Therefore, it would be an optional feature of ytmusicapi, which validates responses for the user to ensure the response is correct. What do you think?
Author
Owner

@sigma67 commented on GitHub (Aug 25, 2020):

In the original issue, you also noted that get_playlist has the same issue, but didn't end up including it in your PR. I just did some tests and it seems to behave consistently (i.e. no varying track counts). Am I correct in assuming that only get_library_songs is affected by this issue for now?

<!-- gh-comment-id:679927401 --> @sigma67 commented on GitHub (Aug 25, 2020): In the original issue, you also noted that `get_playlist` has the same issue, but didn't end up including it in your PR. I just did some tests and it seems to behave consistently (i.e. no varying track counts). Am I correct in assuming that only `get_library_songs` is affected by this issue for now?
Author
Owner

@czifumasa commented on GitHub (Aug 26, 2020):

Am I correct in assuming that only get_library_songs is affected by this issue for now?

Yes, indeed, it seems that get_library_songs has been fixed. Today, I've made some tests for both methods. I wasn't able to reproduce the problem for get_playlist anymore. At first I haven't include it in my PR, because I wasn't sure If you will approve the general concept so I created a fix for only one method. Luckily it's no longer needed.

Unfortunately for get_library_songs problem still exists. I reproduced it in every test I've made.
Regarding your proposed changes, I agree, retry behaviour should be optional. I'll update my PR next weekend when I will have a bit more time.

<!-- gh-comment-id:680958570 --> @czifumasa commented on GitHub (Aug 26, 2020): > Am I correct in assuming that only get_library_songs is affected by this issue for now? Yes, indeed, it seems that `get_library_songs ` has been fixed. Today, I've made some tests for both methods. I wasn't able to reproduce the problem for `get_playlist ` anymore. At first I haven't include it in my PR, because I wasn't sure If you will approve the general concept so I created a fix for only one method. Luckily it's no longer needed. Unfortunately for `get_library_songs` problem still exists. I reproduced it in every test I've made. Regarding your proposed changes, I agree, retry behaviour should be optional. I'll update my PR next weekend when I will have a bit more time.
Author
Owner

@czifumasa commented on GitHub (Aug 30, 2020):

I updated my PR(#53) with requested changes, please take a look.

<!-- gh-comment-id:683435344 --> @czifumasa commented on GitHub (Aug 30, 2020): I updated my PR(#53) with requested changes, please take a look.
Author
Owner

@sigma67 commented on GitHub (Aug 31, 2020):

Thanks for updating the PR! I did some rather extensive testing with the changes and ran get_library_songs(300, validate_responses=True) a few times. I noticed that retries only rarely managed to produce the full 25 results. If they did, it was always after the first retry. Unless you have significantly different results, I propose reducing the max_retries to 1 to improve performance.

(edit: I did some more tests and found 1 or 2 continuations where it worked after 2 or 3 tries (after >15 function calls with 11 continuations each). I believe the performance penalty isn't worth the additional 2 retries).

Check this log:

25
retries: 0
24
24
24
24
retries: 3
24
24
23
24
retries: 3
25
retries: 0
25
retries: 0
25
retries: 0
22
22
22
22
retries: 3
25
retries: 0
19
25
retries: 1
24
24
24
24
retries: 3
22
25
retries: 1
25
retries: 0
24
retries: 0

Here are the debug changes in utils.py l.104:

print(len(parsed_object['parsed']))
    while not validate_func(parsed_object) and retry_counter < max_retries:
        response = request_func(request_additional_params)
        parsed_object = parse_func(response)
        print(len(parsed_object['parsed']))
        retry_counter += 1
    print("retries: " + str(retry_counter))
<!-- gh-comment-id:683713645 --> @sigma67 commented on GitHub (Aug 31, 2020): Thanks for updating the PR! I did some rather extensive testing with the changes and ran `get_library_songs(300, validate_responses=True)` a few times. I noticed that retries only rarely managed to produce the full 25 results. If they did, it was always after the first retry. Unless you have significantly different results, I propose reducing the ```max_retries``` to 1 to improve performance. (edit: I did some more tests and found 1 or 2 continuations where it worked after 2 or 3 tries (after >15 function calls with 11 continuations each). I believe the performance penalty isn't worth the additional 2 retries). Check this log: ``` 25 retries: 0 24 24 24 24 retries: 3 24 24 23 24 retries: 3 25 retries: 0 25 retries: 0 25 retries: 0 22 22 22 22 retries: 3 25 retries: 0 19 25 retries: 1 24 24 24 24 retries: 3 22 25 retries: 1 25 retries: 0 24 retries: 0 ``` Here are the debug changes in utils.py l.104: ```python print(len(parsed_object['parsed'])) while not validate_func(parsed_object) and retry_counter < max_retries: response = request_func(request_additional_params) parsed_object = parse_func(response) print(len(parsed_object['parsed'])) retry_counter += 1 print("retries: " + str(retry_counter)) ```
Author
Owner

@sigma67 commented on GitHub (Aug 31, 2020):

I also found some isolated instances where the key contents is missing completely from the continuation response, causing an error.

We should catch that in both get_parsed_continuation_items and get_continuations. If you want you can add these changes as well, or I can do it.

<!-- gh-comment-id:683717374 --> @sigma67 commented on GitHub (Aug 31, 2020): I also found some isolated instances where the key `contents` is missing completely from the continuation response, causing an error. We should catch that in both ``get_parsed_continuation_items`` and ``get_continuations``. If you want you can add these changes as well, or I can do it.
Author
Owner

@sigma67 commented on GitHub (Sep 1, 2020):

After some more tests I decided to leave the retries at 3, as the number of retries to success seems to vary a lot depending on time of day and account used.

It also seems that the API "warms up" to your requests. For example, if you repeat the same call (get_library_songs(300)) multiple times, subsequent calls have significantly fewer missing items and take less retries. This effect subsides after a while, so I suspect that YouTube's API uses some form of caching here.

Will merge this PR shortly with the bugfix mentioned in the previous comment.

<!-- gh-comment-id:684883035 --> @sigma67 commented on GitHub (Sep 1, 2020): After some more tests I decided to leave the retries at 3, as the number of retries to success seems to vary a lot depending on time of day and account used. It also seems that the API "warms up" to your requests. For example, if you repeat the same call (`get_library_songs(300)`) multiple times, subsequent calls have significantly fewer missing items and take less retries. This effect subsides after a while, so I suspect that YouTube's API uses some form of caching here. Will merge this PR shortly with the bugfix mentioned in the previous comment.
Author
Owner

@czifumasa commented on GitHub (Sep 1, 2020):

Regarding the error with content key, it never occurred for the accounts I tested. I am glad you found it and fixed it.

And regarding max_retries param, I observed exactly the same behaviour that you described, the first time when I use get_library_songs is usually the worst and requires many retries to get correct results. Next calls are much faster, but some continuations still require 1 or 2 retries.

I set the max_retries to 3, because in my tests it returned the most consistent results. Decreasing it to 1 or 2 caused, that sometimes response still had missing songs. Increasing to values higher than 3 never worked. If YTM still sends response with less than 25 songs after more than 3 retries, it probably means that missing songs are permanently unavailable for some reason.

<!-- gh-comment-id:684922600 --> @czifumasa commented on GitHub (Sep 1, 2020): Regarding the error with `content` key, it never occurred for the accounts I tested. I am glad you found it and fixed it. And regarding `max_retries` param, I observed exactly the same behaviour that you described, the first time when I use `get_library_songs` is usually the worst and requires many retries to get correct results. Next calls are much faster, but some continuations still require 1 or 2 retries. I set the `max_retries` to 3, because in my tests it returned the most consistent results. Decreasing it to 1 or 2 caused, that sometimes response still had missing songs. Increasing to values higher than 3 never worked. If YTM still sends response with less than 25 songs after more than 3 retries, it probably means that missing songs are permanently unavailable for some reason.
Author
Owner

@sigma67 commented on GitHub (Sep 29, 2022):

Hi, I'm curious. Are you still using this functionality? I feel like the API has gotten a lot more reliable and the code to achieve this is pretty messy. If it's not being used I'd rather remove it.

<!-- gh-comment-id:1262624046 --> @sigma67 commented on GitHub (Sep 29, 2022): Hi, I'm curious. Are you still using this functionality? I feel like the API has gotten a lot more reliable and the code to achieve this is pretty messy. If it's not being used I'd rather remove it.
Author
Owner

@xplorr commented on GitHub (Oct 4, 2022):

Still use this in my project

<!-- gh-comment-id:1266688282 --> @xplorr commented on GitHub (Oct 4, 2022): Still use this in my project
Author
Owner

@sigma67 commented on GitHub (Oct 4, 2022):

Alright, good to know.

<!-- gh-comment-id:1266693238 --> @sigma67 commented on GitHub (Oct 4, 2022): Alright, good to know.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ytmusicapi#39
No description provided.