[GH-ISSUE #526] [FBRef] Issue with multilevel tables #95

Closed
opened 2026-03-02 15:55:44 +03:00 by kerem · 1 comment
Owner

Originally created by @txz808 on GitHub (Mar 31, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/526

I tried to use the droplevel function to use the date column for the following function:
shooting.columns = shooting.columns.droplevel() team_data = matches.merge(shooting[["date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="date")

It then returns an error saying that ['date'] doesn't exist because when the droplevel is performed on the shooting table, the date function returns an empty column. This is how my matches and shooting columns look like.

matches.columns:

Index(\['date', 'time', 'round', 'day', 'venue', 'result', 'GF', 'GA', 'opponent', 'xG', 'xGA', 'Poss', 'Attendance', 'Captain', 'Formation', 'Referee', 'match_report', 'Notes'\], dtype='object')

shooting.columns:

MultiIndex([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', ''), ( 'Standard', 'Gls'), ( 'Standard', 'Sh'), ( 'Standard', 'SoT'), ( 'Standard', 'SoT%'), ( 'Standard', 'G/Sh'), ( 'Standard', 'G/SoT'), ( 'Standard', 'Dist'), ( 'Standard', 'FK'), ( 'Standard', 'PK'), ( 'Standard', 'PKatt'), ( 'Expected', 'xG'), ( 'Expected', 'npxG'), ( 'Expected', 'npxG/Sh'), ( 'Expected', 'G-xG'), ( 'Expected', 'np:G-xG'), ( 'time', ''), ('match_report', '')], )

The code that generates the columns is:
`import soccerdata as sd
from bs4 import BeautifulSoup
import pandas as pd

fbref = sd.FBref(leagues="ENG-Premier League", seasons="2324")
team_season_stats = fbref.read_team_season_stats()
team_season_stats.head()

matches = fbref.read_team_match_stats(stat_type="schedule")
matches.head()

shooting= fbref.read_team_match_stats(stat_type="shooting")
shooting.head()`

I tried reverting the indexes of

([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', '')] )
so they could be on the same level as the other half but that didn't work.

Originally created by @txz808 on GitHub (Mar 31, 2024). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/526 I tried to use the droplevel function to use the date column for the following function: `shooting.columns = shooting.columns.droplevel() team_data = matches.merge(shooting[["date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="date")` It then returns an error saying that ['date'] doesn't exist because when the droplevel is performed on the shooting table, the date function returns an empty column. This is how my matches and shooting columns look like. matches.columns: `Index(\['date', 'time', 'round', 'day', 'venue', 'result', 'GF', 'GA', 'opponent', 'xG', 'xGA', 'Poss', 'Attendance', 'Captain', 'Formation', 'Referee', 'match_report', 'Notes'\], dtype='object')` shooting.columns: `MultiIndex([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', ''), ( 'Standard', 'Gls'), ( 'Standard', 'Sh'), ( 'Standard', 'SoT'), ( 'Standard', 'SoT%'), ( 'Standard', 'G/Sh'), ( 'Standard', 'G/SoT'), ( 'Standard', 'Dist'), ( 'Standard', 'FK'), ( 'Standard', 'PK'), ( 'Standard', 'PKatt'), ( 'Expected', 'xG'), ( 'Expected', 'npxG'), ( 'Expected', 'npxG/Sh'), ( 'Expected', 'G-xG'), ( 'Expected', 'np:G-xG'), ( 'time', ''), ('match_report', '')], )` The code that generates the columns is: `import soccerdata as sd from bs4 import BeautifulSoup import pandas as pd fbref = sd.FBref(leagues="ENG-Premier League", seasons="2324") team_season_stats = fbref.read_team_season_stats() team_season_stats.head() matches = fbref.read_team_match_stats(stat_type="schedule") matches.head() shooting= fbref.read_team_match_stats(stat_type="shooting") shooting.head()` I tried reverting the indexes of `([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', '')] )` so they could be on the same level as the other half but that didn't work.
kerem closed this issue 2026-03-02 15:55:44 +03:00
Author
Owner

@Kalaweksh commented on GitHub (Apr 13, 2024):

I usually swap the 'game' index level with the 'date' column on loading the tables, since dates are easier to work with.

I think it may be worth opening an issue/pull request to make this default behavior.

I think what's likely happening in your scenario is that you are dropping the named columns on the top level and are left with only the lower level column names (which don't include 'date'). I would either add an empty level to the 'schedule' table's columns (to avoid losing data in 'shooting') or set the 'on' parameters in pd.merge to include the columns from both tables you want to keep.

<!-- gh-comment-id:2053766213 --> @Kalaweksh commented on GitHub (Apr 13, 2024): I usually swap the 'game' index level with the 'date' column on loading the tables, since dates are easier to work with. I think it may be worth opening an issue/pull request to make this default behavior. I think what's likely happening in your scenario is that you are dropping the named columns on the top level and are left with only the lower level column names (which don't include 'date'). I would either add an empty level to the 'schedule' table's columns (to avoid losing data in 'shooting') or set the 'on' parameters in pd.merge to include the columns from both tables you want to keep.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#95
No description provided.