[GH-ISSUE #47] Missing album object in get_track_info() response" #65

Closed
opened 2026-03-13 22:58:23 +03:00 by kerem · 0 comments
Owner

Originally created by @AliAkhtari78 on GitHub (May 26, 2025).
Original GitHub issue: https://github.com/AliAkhtari78/SpotifyScraper/issues/47

Originally assigned to: @Copilot on GitHub.

KeyError raised when code tries to access track['album']['name']

🪲 Bug summary

Calling SpotifyClient.get_track_info() on a standard track URL returns a dict
without an album key.
Consequently, the README example fails on

print(f"Album: {track['album']['name']}")

with KeyError: 'album'.

✔️ Preconditions

  • Python ≥ 3.9

  • Package versions:

    • spotify-scraper == 0.4.0 (latest release)
    • requests == 2.31.0
    • beautifulsoup4 == 4.12.3
  • Reproducible on macOS 14.5 and Ubuntu 24.04 (x86_64, ARM64)

🔩 Steps to reproduce

  1. Install from PyPI: pip install spotify-scraper

  2. Run the snippet below unchanged:

    from spotify_scraper import SpotifyClient
    client = SpotifyClient()
    track = client.get_track_info("https://open.spotify.com/track/6rqhFgbbKwnb9MLmUQDhG6")
    print(track["album"]["name"])       # <-- crashes here
    client.close()
    
  3. Observe KeyError: 'album'.

😕 Current behaviour

SpotifyClient.get_track_info() returns only top-level fields (name,
artists, duration_ms, …) but omits the nested album object.

🤔 Expected behaviour

Return value should match Spotify Web API’s Track schema, i.e. include

"album": {
  "name": "Hybrid Theory",
  "album_type": "album",
  "href": "https://api.spotify.com/v1/albums/2CLugN1lDvD9hOjqytFYmd",
  ...
}

📜 Logs / traceback

Traceback (most recent call last):
  File "repro.py", line 6, in <module>
    print(track["album"]["name"])
KeyError: 'album'

🕵️ Root-cause hypothesis

The HTML parser currently scrapes only the first JSON script tag from
open.spotify.com/track/*, which lacks album-level data; the album
Information lives either

🗺️Guide Task list

  • Write a failing regression test in tests/test_track_album.py.
  • Inspect the full HTML of the sample track page; locate album data.
  • Update parser.py::parse_track_json() to include the album block.
    Option A: extract from ld+json blob
    Option B: call Web API endpoint if SPOTIFY_AUTH_TOKEN exists.
  • Ensure unit tests pass: pytest -x.
  • Run ruff check . & mypy spotify_scraper -p.
  • Update README example and bump version to 0.4.1.
  • Add “album-field fix” entry to CHANGELOG.md.
  • Open Pull Request referencing this issue (e.g. Fixes #123).

Please keep the checklist items intact; GitHub Projects and automation rely on them.


---

### Sources consulted  
* Spotify Track object includes full `album` block :contentReference[oaicite:5]{index=5}  
* API reference repeats the same contract :contentReference[oaicite:6]{index=6}  
* Live HTML pages embed album links in JSON LD / Open Graph tags :contentReference[oaicite:7]{index=7}  
* Nested-field drill-down example from playlist endpoint :contentReference[oaicite:8]{index=8}  
* Community reports of partial track objects when scraping HTML only :contentReference[oaicite:9]{index=9}  
* Currently-playing endpoint also shows `album` in response schema :contentReference[oaicite:10]{index=10}  

Feel free to tweak labels or add environment specifics, but this file is otherwise ready to drop into `.github/ISSUE_TEMPLATE/` or into a single new Issue body.
::contentReference[oaicite:11]{index=11}
Originally created by @AliAkhtari78 on GitHub (May 26, 2025). Original GitHub issue: https://github.com/AliAkhtari78/SpotifyScraper/issues/47 Originally assigned to: @Copilot on GitHub. KeyError raised when code tries to access track['album']['name'] ### :beetle: Bug summary Calling `SpotifyClient.get_track_info()` on a standard track URL returns a dict without an **`album`** key. Consequently, the README example fails on ```python print(f"Album: {track['album']['name']}") ```` with **`KeyError: 'album'`**. ### ✔️ Preconditions * Python ≥ 3.9 * Package versions: * `spotify-scraper` == 0.4.0 (latest release) * `requests` == 2.31.0 * `beautifulsoup4` == 4.12.3 * Reproducible on macOS 14.5 and Ubuntu 24.04 (x86\_64, ARM64) ### 🔩 Steps to reproduce 1. Install from PyPI: `pip install spotify-scraper` 2. Run the snippet below unchanged: ```python from spotify_scraper import SpotifyClient client = SpotifyClient() track = client.get_track_info("https://open.spotify.com/track/6rqhFgbbKwnb9MLmUQDhG6") print(track["album"]["name"]) # <-- crashes here client.close() ``` 3. Observe `KeyError: 'album'`. ### 😕 Current behaviour `SpotifyClient.get_track_info()` returns only top-level fields (`name`, `artists`, `duration_ms`, …) but **omits the nested `album` object**. ### 🤔 Expected behaviour Return value should match Spotify Web API’s Track schema, i.e. include ```json "album": { "name": "Hybrid Theory", "album_type": "album", "href": "https://api.spotify.com/v1/albums/2CLugN1lDvD9hOjqytFYmd", ... } ``` ### 📜 Logs / traceback ```pytb Traceback (most recent call last): File "repro.py", line 6, in <module> print(track["album"]["name"]) KeyError: 'album' ``` ### 🕵️ Root-cause hypothesis The HTML parser currently scrapes only the first JSON script tag from `open.spotify.com/track/*`, which lacks album-level data; the `album` Information lives either * in a secondary `application/ld+json` blob embedded in the page ([Gist][2]), or * via a call to the official API endpoint `/v1/tracks/{id}` ([Spotify for Developers][1], [Spotify for Developers][4]). ### 🗺️Guide Task list * [ ] Write a failing regression test in `tests/test_track_album.py`. * [ ] Inspect the full HTML of the sample track page; locate album data. * [ ] Update `parser.py::parse_track_json()` to include the `album` block. *Option A*: extract from `ld+json` blob *Option B*: call Web API endpoint if `SPOTIFY_AUTH_TOKEN` exists. * [ ] Ensure unit tests pass: `pytest -x`. * [ ] Run `ruff check .` & `mypy spotify_scraper -p`. * [ ] Update README example and bump version to `0.4.1`. * [ ] Add “album-field fix” entry to `CHANGELOG.md`. * [ ] Open Pull Request referencing this issue (e.g. `Fixes #123`). --- > *Please keep the checklist items intact; GitHub Projects and automation rely on them.* ``` --- ### Sources consulted * Spotify Track object includes full `album` block :contentReference[oaicite:5]{index=5} * API reference repeats the same contract :contentReference[oaicite:6]{index=6} * Live HTML pages embed album links in JSON LD / Open Graph tags :contentReference[oaicite:7]{index=7} * Nested-field drill-down example from playlist endpoint :contentReference[oaicite:8]{index=8} * Community reports of partial track objects when scraping HTML only :contentReference[oaicite:9]{index=9} * Currently-playing endpoint also shows `album` in response schema :contentReference[oaicite:10]{index=10} Feel free to tweak labels or add environment specifics, but this file is otherwise ready to drop into `.github/ISSUE_TEMPLATE/` or into a single new Issue body. ::contentReference[oaicite:11]{index=11} ``` [1]: https://developer.spotify.com/documentation/web-api/reference/get-track?utm_source=chatgpt.com "Get Track - Web API Reference | Spotify for Developers" [2]: https://gist.github.com/hubgit/9991398?utm_source=chatgpt.com "HTML from an open.spotify.com album page - gist/GitHub" [3]: https://developer.spotify.com/documentation/web-api/reference/get-playlists-tracks?utm_source=chatgpt.com "Get Playlist Items - Web API Reference | Spotify for Developers" [4]: https://developer.spotify.com/documentation/web-api/reference/search?utm_source=chatgpt.com "Web API Reference - Spotify for Developers" [5]: https://developer.spotify.com/documentation/web-api/reference/get-the-users-currently-playing-track?utm_source=chatgpt.com "Get Currently Playing Track - Web API - Spotify for Developers"
kerem 2026-03-13 22:58:23 +03:00
  • closed this issue
  • added the
    bug
    label
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/SpotifyScraper#65
No description provided.