[PR #284] [MERGED] [FBref] Handle missing Scores & Fixtures page on the Big 5 European Leagues Stats page #430

Closed
opened 2026-03-02 15:57:54 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/probberechts/soccerdata/pull/284
Author: @lorenzodb1
Created: 7/7/2023
Status: Merged
Merged: 7/28/2023
Merged by: @probberechts

Base: masterHead: lorenzodb1-read-schedule-fix


📝 Commits (10+)

  • a41adba Fixed issue in read_schedule by moving the Top 5 Leagues optimisation in read_leagues
  • 99c0dcb Fixed issue in read_schedule by moving the Top 5 Leagues optimisation in read_leagues
  • af93780 Merge branch 'master' into lorenzodb1-read-schedule-fix
  • 88ed9a9 Fixed bug affecting read_leagues when not optimised
  • dc392b1 Changed logic to extend fix to read_schedule
  • e9c22af Fixed bug affecting list of leagues when "Big 5 European Leagues Combined" is not present
  • 7cd9ae5 Changed parameter to bool as per suggestion
  • 70a61fb Merge branch 'master' into lorenzodb1-read-schedule-fix
  • e7f3dfe Empty commit
  • d659a27 Added logic to cover missed case when optimising the big 5 leagues

📊 Changes

2 files changed (+39 additions, -17 deletions)

View changed files

📝 soccerdata/fbref.py (+34 -17)
📝 tests/test_FBref.py (+5 -0)

📄 Description

  • For every n rows, the website adds a row in a table that replicates the table header. This caused read_schedule to fail as the number of rows in df_table would be higher than the one of the list of match URLs obtained (see
    https://github.com/probberechts/soccerdata/issues/277). I added the logic to remove those replicated headers when found.
  • The website has no specific Scores & Fixtures on the Big 5 European Leagues Stats page. Thus it'd go to the generic Scores & Fixtures page, which shows games currently being played. Because of this, I had to move the optimisation that combines the top five leagues under that label in read_leagues, as read_schedule necessarily needs the five top leagues separately rather than in their combined form.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/probberechts/soccerdata/pull/284 **Author:** [@lorenzodb1](https://github.com/lorenzodb1) **Created:** 7/7/2023 **Status:** ✅ Merged **Merged:** 7/28/2023 **Merged by:** [@probberechts](https://github.com/probberechts) **Base:** `master` ← **Head:** `lorenzodb1-read-schedule-fix` --- ### 📝 Commits (10+) - [`a41adba`](https://github.com/probberechts/soccerdata/commit/a41adba1e957107ee07cc62b04cc85e4b78fc065) Fixed issue in read_schedule by moving the Top 5 Leagues optimisation in read_leagues - [`99c0dcb`](https://github.com/probberechts/soccerdata/commit/99c0dcbc16036d5caac5ddca31e3f6efcbdd856d) Fixed issue in read_schedule by moving the Top 5 Leagues optimisation in read_leagues - [`af93780`](https://github.com/probberechts/soccerdata/commit/af93780920180a74e80eac0e380f2ce2b6d3089a) Merge branch 'master' into lorenzodb1-read-schedule-fix - [`88ed9a9`](https://github.com/probberechts/soccerdata/commit/88ed9a99cb22948eb881ee0d5f2d1823abf7f21f) Fixed bug affecting read_leagues when not optimised - [`dc392b1`](https://github.com/probberechts/soccerdata/commit/dc392b116f54df704c492371a1649e94df9fa0ff) Changed logic to extend fix to read_schedule - [`e9c22af`](https://github.com/probberechts/soccerdata/commit/e9c22af43b5463b57fe78e8b3e7233cd123ffa97) Fixed bug affecting list of leagues when "Big 5 European Leagues Combined" is not present - [`7cd9ae5`](https://github.com/probberechts/soccerdata/commit/7cd9ae59e4fae60a1d0334c9a8c6dd23716c09d2) Changed parameter to bool as per suggestion - [`70a61fb`](https://github.com/probberechts/soccerdata/commit/70a61fbf19191fbeb13132e939e2bf78e17662db) Merge branch 'master' into lorenzodb1-read-schedule-fix - [`e7f3dfe`](https://github.com/probberechts/soccerdata/commit/e7f3dfe867a35f654ad8617cead551bd2385d74a) Empty commit - [`d659a27`](https://github.com/probberechts/soccerdata/commit/d659a27cdc59a517ee343f293ca2cc47cc78b78d) Added logic to cover missed case when optimising the big 5 leagues ### 📊 Changes **2 files changed** (+39 additions, -17 deletions) <details> <summary>View changed files</summary> 📝 `soccerdata/fbref.py` (+34 -17) 📝 `tests/test_FBref.py` (+5 -0) </details> ### 📄 Description - For every n rows, the website adds a row in a table that replicates the table header. This caused read_schedule to fail as the number of rows in df_table would be higher than the one of the list of match URLs obtained (see https://github.com/probberechts/soccerdata/issues/277). I added the logic to remove those replicated headers when found. - The website has no specific Scores & Fixtures on the Big 5 European Leagues Stats page. Thus it'd go to the generic Scores & Fixtures page, which shows games currently being played. Because of this, I had to move the optimisation that combines the top five leagues under that label in read_leagues, as read_schedule necessarily needs the five top leagues separately rather than in their combined form. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 15:57:54 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#430
No description provided.