[GH-ISSUE #524] [FBRef] Using the droplevel function leaves some columns empty #91

Closed
opened 2026-03-02 15:55:43 +03:00 by kerem · 4 comments
Owner

Originally created by @txz808 on GitHub (Mar 29, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/524

I tried to use the droplevel function to use the date column for the following function:
team_data = matches.merge(shooting"date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt", on="date")
It then returns an error saying that ['date'] doesn't exist, because when the droplevel is performed on the shooting table, the date function returns an empty column.
Could I get help with it as it is integral to the project I'm doing?

Originally created by @txz808 on GitHub (Mar 29, 2024). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/524 I tried to use the droplevel function to use the date column for the following function: team_data = matches.merge(shooting[["date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="date") It then returns an error saying that ['date'] doesn't exist, because when the droplevel is performed on the shooting table, the date function returns an empty column. Could I get help with it as it is integral to the project I'm doing?
kerem 2026-03-02 15:55:43 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@probberechts commented on GitHub (Mar 30, 2024):

Could you please provide some clarification on the issue you're facing? I do not think I understand your question. Could you provide more details or context about how you're using the droplevel function and how it relates to soccerdata? A complete code example would be helpful.

<!-- gh-comment-id:2028447495 --> @probberechts commented on GitHub (Mar 30, 2024): Could you please provide some clarification on the issue you're facing? I do not think I understand your question. Could you provide more details or context about how you're using the `droplevel` function and how it relates to soccerdata? A complete code example would be helpful.
Author
Owner

@txz808 commented on GitHub (Mar 30, 2024):

I tried to use the droplevel function to use the date column for the following function:

shooting.columns = shooting.columns.droplevel()
team_data = matches.merge(shooting[["date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="date")

It then returns an error saying that ['date'] doesn't exist because when the droplevel is performed on the shooting table, the date function returns an empty column. This is how my matches and shooting columns look like.

matches.columns:

Index(\['date', 'time', 'round', 'day', 'venue', 'result', 'GF', 'GA', 'opponent', 'xG', 'xGA', 'Poss', 'Attendance', 'Captain', 'Formation', 'Referee', 'match_report', 'Notes'\], dtype='object')

shooting.columns:

MultiIndex([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', ''), ( 'Standard', 'Gls'), ( 'Standard', 'Sh'), ( 'Standard', 'SoT'), ( 'Standard', 'SoT%'), ( 'Standard', 'G/Sh'), ( 'Standard', 'G/SoT'), ( 'Standard', 'Dist'), ( 'Standard', 'FK'), ( 'Standard', 'PK'), ( 'Standard', 'PKatt'), ( 'Expected', 'xG'), ( 'Expected', 'npxG'), ( 'Expected', 'npxG/Sh'), ( 'Expected', 'G-xG'), ( 'Expected', 'np:G-xG'), ( 'time', ''), ('match_report', '')], )

The code that generates the columns is:
import soccerdata as sd
from bs4 import BeautifulSoup
import pandas as pd

fbref = sd.FBref(leagues="ENG-Premier League", seasons="2324")
team_season_stats = fbref.read_team_season_stats()
team_season_stats.head()

matches = fbref.read_team_match_stats(stat_type="schedule")
matches.head()

shooting= fbref.read_team_match_stats(stat_type="shooting")
shooting.head()

I tried reverting the indexes of

([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', '')] )
so they could be on the same level as the other half but that didn't work.

<!-- gh-comment-id:2028448829 --> @txz808 commented on GitHub (Mar 30, 2024): I tried to use the droplevel function to use the date column for the following function: `shooting.columns = shooting.columns.droplevel()` `team_data = matches.merge(shooting[["date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="date")` It then returns an error saying that ['date'] doesn't exist because when the droplevel is performed on the shooting table, the date function returns an empty column. This is how my matches and shooting columns look like. matches.columns: `Index(\['date', 'time', 'round', 'day', 'venue', 'result', 'GF', 'GA', 'opponent', 'xG', 'xGA', 'Poss', 'Attendance', 'Captain', 'Formation', 'Referee', 'match_report', 'Notes'\], dtype='object')` shooting.columns: `MultiIndex([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', ''), ( 'Standard', 'Gls'), ( 'Standard', 'Sh'), ( 'Standard', 'SoT'), ( 'Standard', 'SoT%'), ( 'Standard', 'G/Sh'), ( 'Standard', 'G/SoT'), ( 'Standard', 'Dist'), ( 'Standard', 'FK'), ( 'Standard', 'PK'), ( 'Standard', 'PKatt'), ( 'Expected', 'xG'), ( 'Expected', 'npxG'), ( 'Expected', 'npxG/Sh'), ( 'Expected', 'G-xG'), ( 'Expected', 'np:G-xG'), ( 'time', ''), ('match_report', '')], )` The code that generates the columns is: import soccerdata as sd from bs4 import BeautifulSoup import pandas as pd fbref = sd.FBref(leagues="ENG-Premier League", seasons="2324") team_season_stats = fbref.read_team_season_stats() team_season_stats.head() matches = fbref.read_team_match_stats(stat_type="schedule") matches.head() shooting= fbref.read_team_match_stats(stat_type="shooting") shooting.head() I tried reverting the indexes of `([( 'date', ''), ( 'round', ''), ( 'day', ''), ( 'venue', ''), ( 'result', ''), ( 'GF', ''), ( 'GA', ''), ( 'opponent', '')] )` so they could be on the same level as the other half but that didn't work.
Author
Owner

@probberechts commented on GitHub (Apr 1, 2024):

I think you want to flatten the index of the "shooting" dataframe before you do the merge. You can join both levels with:

shooting.columns = [' '.join(col).strip() for col in shooting.columns.values]

As a sidenote, I do not know what your specific use case is, but merging on the "date" column seems odd. Why do you not simply join on the indexes (i.e., the match id)?

Also, it seems like your question is about how to use Pandas rather than about soccerdata. Hence, it might be better suited for Stack Overflow. GitHub issues are primarily intended for tracking bugs and discussing feature requests, not for answering technical questions.

<!-- gh-comment-id:2029585636 --> @probberechts commented on GitHub (Apr 1, 2024): I think you want to flatten the index of the "shooting" dataframe before you do the merge. You can join both levels with: ``` shooting.columns = [' '.join(col).strip() for col in shooting.columns.values] ``` As a sidenote, I do not know what your specific use case is, but merging on the "date" column seems odd. Why do you not simply join on the indexes (i.e., the match id)? Also, it seems like your question is about how to use Pandas rather than about soccerdata. Hence, it might be better suited for Stack Overflow. GitHub issues are primarily intended for tracking bugs and discussing feature requests, not for answering technical questions.
Author
Owner

@txz808 commented on GitHub (Apr 1, 2024):

I joined on the 'date' column as the technique I plan to use involves me using elements from the date table later on.

<!-- gh-comment-id:2030041164 --> @txz808 commented on GitHub (Apr 1, 2024): I joined on the 'date' column as the technique I plan to use involves me using elements from the date table later on.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#91
No description provided.