[PR #907] [MERGED] [Understat] Use new JSON API endpoints #902

Closed
opened 2026-03-02 16:00:06 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/probberechts/soccerdata/pull/907
Author: @dimitrismoustakas
Created: 12/13/2025
Status: Merged
Merged: 1/6/2026
Merged by: @probberechts

Base: masterHead: understat-fix


📝 Commits (1)

  • 01f87fc feat: fixed Understat scraper to work with the new JSON API endpoints

📊 Changes

1 file changed (+82 additions, -16 deletions)

View changed files

📝 soccerdata/understat.py (+82 -16)

📄 Description

As discussed on Issues #904 and #905, the understat module has stopped working. Understat no longer supports HTML scraping but instead offers API endpoints where you can access the same data as before. I've made the change so that the package uses the endpoints instead. Understat tests pass (they don't pass on the current main branch), but not tests on the other packages. I've tested and these tests failed to begin with so it's not an issue my pull causes but there's an underlying issue with these modules or the tests are not maintained properly.

API Integration and Data Fetching Improvements:

  • Refactored league, season, and match data retrieval to use Understat's internal API endpoints (/getStatData, /getLeagueData/{league}/{season}, /getMatchData/{match_id}) instead of HTML scraping.
  • Added the _request_api helper method to centralize API requests with appropriate headers, caching, and file storage, improving consistency and reducing code duplication.
  • Introduced the UNDERSTAT_HEADERS constant to ensure all API requests include the required X-Requested-With header.

Session and Cookie Management:

  • Added _ensure_cookies to initialize session cookies from the homepage before making API requests, ensuring authenticated and consistent access. [1] [2]

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/probberechts/soccerdata/pull/907 **Author:** [@dimitrismoustakas](https://github.com/dimitrismoustakas) **Created:** 12/13/2025 **Status:** ✅ Merged **Merged:** 1/6/2026 **Merged by:** [@probberechts](https://github.com/probberechts) **Base:** `master` ← **Head:** `understat-fix` --- ### 📝 Commits (1) - [`01f87fc`](https://github.com/probberechts/soccerdata/commit/01f87fc55798a92d3e2e68a7e519eaf164421b1c) feat: fixed Understat scraper to work with the new JSON API endpoints ### 📊 Changes **1 file changed** (+82 additions, -16 deletions) <details> <summary>View changed files</summary> 📝 `soccerdata/understat.py` (+82 -16) </details> ### 📄 Description As discussed on Issues #904 and #905, the understat module has stopped working. Understat no longer supports HTML scraping but instead offers API endpoints where you can access the same data as before. I've made the change so that the package uses the endpoints instead. Understat tests pass (they don't pass on the current main branch), but not tests on the other packages. I've tested and these tests failed to begin with so it's not an issue my pull causes but there's an underlying issue with these modules or the tests are not maintained properly. **API Integration and Data Fetching Improvements:** * Refactored league, season, and match data retrieval to use Understat's internal API endpoints (`/getStatData`, `/getLeagueData/{league}/{season}`, `/getMatchData/{match_id}`) instead of HTML scraping. * Added the `_request_api` helper method to centralize API requests with appropriate headers, caching, and file storage, improving consistency and reducing code duplication. * Introduced the `UNDERSTAT_HEADERS` constant to ensure all API requests include the required `X-Requested-With` header. **Session and Cookie Management:** * Added `_ensure_cookies` to initialize session cookies from the homepage before making API requests, ensuring authenticated and consistent access. [[1]](diffhunk://#diff-a51d47a115d7c60acb78722aa9a50ba3cdad7d9ee8e4df411a2f25cb0b980352R88-R94) [[2]](diffhunk://#diff-a51d47a115d7c60acb78722aa9a50ba3cdad7d9ee8e4df411a2f25cb0b980352L640-R731) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 16:00:06 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#902
No description provided.