mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 10:35:53 +03:00
[GH-ISSUE #79] [FBref] NaNs found in 'standard' and 'playing_time' stat_types #16
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#16
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @spartanovo on GitHub (Aug 28, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/79
Hello,
I have found a small bug when pulling data from FBRef.com. NaN values appearing in the MP columns in the data for stat_types standard and playing_time for players who have played in the season.
I found this problem after I wrote a function to obtain multiple stat_types for multiple seasons and converted the DataFrames from a multiindex to a standard pandas DataFrame. I found a large quantity of NaNs due to this transformation.
To troubleshoot, I did a single pull using the
.read_player_season_stats(stat_type = 'standard')call on 2 seasons of data (1718 & 1819) and found NaN values in both theMPandPlaying Time MPcolumns. Players who played and did not play had received NaN values in the aforementioned columns. Under the "Playing Time" section's MP column, I found 890 NaN values and in the standalone 'MP' column, I found 380 NaN values.I am transitioning from R to Python and have always used the flattened-style DataFrame in the past.
Attached is a csv file containing the aforementioned data.
Call:
I greatly appreciate your assistance.
fbref_nan_bug_df.csv
@probberechts commented on GitHub (Sep 2, 2022):
FBRef uses a different layout for the 2017/18 and 2018/19 seasons. In the 2017/18 season, the "MP" column is a separate category. While in the 2018/19 season it is grouped under "Playing Time".
All you need is two lines of post-processing:
I'll add this to the codebase later.
@spartanovo commented on GitHub (Sep 2, 2022):
Awesome. That fixed the problem. Thank you @probberechts!