[GH-ISSUE #79] [FBref] NaNs found in 'standard' and 'playing_time' stat_types #16

Closed
opened 2026-03-02 15:55:03 +03:00 by kerem · 2 comments
Owner

Originally created by @spartanovo on GitHub (Aug 28, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/79

Hello,

I have found a small bug when pulling data from FBRef.com. NaN values appearing in the MP columns in the data for stat_types standard and playing_time for players who have played in the season.

I found this problem after I wrote a function to obtain multiple stat_types for multiple seasons and converted the DataFrames from a multiindex to a standard pandas DataFrame. I found a large quantity of NaNs due to this transformation. 

To troubleshoot, I did a single pull using the .read_player_season_stats(stat_type = 'standard') call on 2 seasons of data (1718 & 1819) and found NaN values in both the MP and Playing Time MP columns. Players who played and did not play had received NaN values in the aforementioned columns.  Under the "Playing Time" section's MP column, I found 890 NaN values and in the standalone 'MP' column, I found 380 NaN values. 
I am transitioning from R to Python and have always used the flattened-style DataFrame in the past.

Attached is a csv file containing the aforementioned data.

Call:

fbref_test = sd.FBref(leagues=['ENG-Premier League'], seasons= ['1718', '1819'])

hold = fbref_test.read_player_season_stats(stat_type = 'standard')
hold.head()

I greatly appreciate your assistance.
fbref_nan_bug_df.csv

Originally created by @spartanovo on GitHub (Aug 28, 2022). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/79 Hello, I have found a small bug when pulling data from FBRef.com. NaN values appearing in the MP columns in the data for stat_types standard and playing_time for players who have played in the season. I found this problem after I wrote a function to obtain multiple stat_types for multiple seasons and converted the DataFrames from a multiindex to a standard pandas DataFrame. I found a large quantity of NaNs due to this transformation.  To troubleshoot, I did a single pull using the ```.read_player_season_stats(stat_type = 'standard')``` call on 2 seasons of data (1718 & 1819) and found NaN values in both the ```MP``` and ``` Playing Time MP``` columns. Players who played and did not play had received NaN values in the aforementioned columns.  Under the "Playing Time" section's MP column, I found 890 NaN values and in the standalone 'MP' column, I found 380 NaN values.  I am transitioning from R to Python and have always used the flattened-style DataFrame in the past. Attached is a csv file containing the aforementioned data. Call: ``` fbref_test = sd.FBref(leagues=['ENG-Premier League'], seasons= ['1718', '1819']) hold = fbref_test.read_player_season_stats(stat_type = 'standard') hold.head() ``` I greatly appreciate your assistance. [fbref_nan_bug_df.csv](https://github.com/probberechts/soccerdata/files/9440517/fbref_nan_bug_df.csv)
kerem closed this issue 2026-03-02 15:55:03 +03:00
Author
Owner

@probberechts commented on GitHub (Sep 2, 2022):

FBRef uses a different layout for the 2017/18 and 2018/19 seasons. In the 2017/18 season, the "MP" column is a separate category. While in the 2018/19 season it is grouped under "Playing Time".

All you need is two lines of post-processing:

hold[("Playing Time", "MP")] = hold[("Playing Time", "MP")].fillna(hold["MP"])
hold.drop(columns=["MP"])

I'll add this to the codebase later.

<!-- gh-comment-id:1235255564 --> @probberechts commented on GitHub (Sep 2, 2022): FBRef uses a different layout for the 2017/18 and 2018/19 seasons. In the [2017/18 season](https://fbref.com/en/squads/7c21e445/2017-2018/West-Ham-United-Stats), the "MP" column is a separate category. While in the [2018/19 season](https://fbref.com/en/squads/7c21e445/2018-2019/West-Ham-United-Stats) it is grouped under "Playing Time". All you need is two lines of post-processing: ```python hold[("Playing Time", "MP")] = hold[("Playing Time", "MP")].fillna(hold["MP"]) hold.drop(columns=["MP"]) ``` I'll add this to the codebase later.
Author
Owner

@spartanovo commented on GitHub (Sep 2, 2022):

Awesome. That fixed the problem. Thank you @probberechts!

<!-- gh-comment-id:1235760899 --> @spartanovo commented on GitHub (Sep 2, 2022): Awesome. That fixed the problem. Thank you @probberechts!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#16
No description provided.