[GH-ISSUE #750] [Match History] football-data CSV files about 24/25 season start with BOM char #162

Closed
opened 2026-03-02 15:56:17 +03:00 by kerem · 2 comments
Owner

Originally created by @sandromodarelli on GitHub (Nov 6, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/750

Describe the bug
all CSV files from football-data about 24/25 season start with a BOM charachter. this leads to a "fake" column named Div in the dataset and makes the league translation impossible (with an exception)

Affected scrapers
This affects the following scrapers:

  • ClubElo
  • ESPN
  • FBref
  • FiveThirtyEight
  • FotMob
  • Match History
  • SoFIFA
  • Understat
  • WhoScored

Code example
A minimal code example that fails. Use no_cache=True to make sure an invalid cached file does not cause the bug and make sure you have the latest version of soccerdata installed.

import soccerdata as sd
mh = sd.MatchHistory('ITA-Serie A', "24/25", no_cache=True)
data = mh.read_games()

Error message

Traceback (most recent call last):
  File ".venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'league'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".venv/lib/python3.12/site-packages/soccerdata/match_history.py", line 116, in read_games
    .pipe(self._translate_league)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6231, in pipe
    return common.pipe(self, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pandas/core/common.py", line 502, in pipe
    return func(obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/soccerdata/_common.py", line 399, in _translate_league
    mask = ~df[col].isin(flip)
            ~~^^^^^
  File ".venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: 'league'

Additional context
Add any other context about the problem here.

Contributor Action Plan

  • I can fix this issue and will submit a pull request.
  • I’m unsure how to fix this, but I'm willing to work on it with guidance.
  • I’m not able to fix this issue.
Originally created by @sandromodarelli on GitHub (Nov 6, 2024). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/750 **Describe the bug** all CSV files from football-data about 24/25 season start with a BOM charachter. this leads to a "fake" column named `Div` in the dataset and makes the league translation impossible (with an exception) **Affected scrapers** This affects the following scrapers: - [ ] ClubElo - [ ] ESPN - [ ] FBref - [ ] FiveThirtyEight - [ ] FotMob - [X] Match History - [ ] SoFIFA - [ ] Understat - [ ] WhoScored **Code example** A minimal code example that fails. Use `no_cache=True` to make sure an invalid cached file does not cause the bug and make sure you have the latest version of soccerdata installed. ```python import soccerdata as sd mh = sd.MatchHistory('ITA-Serie A', "24/25", no_cache=True) data = mh.read_games() ``` **Error message** ``` Traceback (most recent call last): File ".venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc return self._engine.get_loc(casted_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'league' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".venv/lib/python3.12/site-packages/soccerdata/match_history.py", line 116, in read_games .pipe(self._translate_league) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/pandas/core/generic.py", line 6231, in pipe return common.pipe(self, func, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/pandas/core/common.py", line 502, in pipe return func(obj, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/soccerdata/_common.py", line 399, in _translate_league mask = ~df[col].isin(flip) ~~^^^^^ File ".venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4102, in __getitem__ indexer = self.columns.get_loc(key) ^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc raise KeyError(key) from err KeyError: 'league' ``` **Additional context** Add any other context about the problem here. **Contributor Action Plan** - [X] I can fix this issue and will submit a pull request. - [ ] I’m unsure how to fix this, but I'm willing to work on it with guidance. - [ ] I’m not able to fix this issue.
kerem 2026-03-02 15:56:17 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@Riccardo231 commented on GitHub (Dec 28, 2024):

Hello, I'd like to join the project by working on this issue.

<!-- gh-comment-id:2564327329 --> @Riccardo231 commented on GitHub (Dec 28, 2024): Hello, I'd like to join the project by working on this issue.
Author
Owner

@probberechts commented on GitHub (Dec 31, 2024):

Hi @ricor07. We'd love to have your help. My initial idea to solve this would be to add a function _parse_csv(...) that selects the appropriate parser/encoding depending on the league and season.

<!-- gh-comment-id:2566236384 --> @probberechts commented on GitHub (Dec 31, 2024): Hi @ricor07. We'd love to have your help. My initial idea to solve this would be to add a function `_parse_csv(...)` that selects the appropriate parser/encoding depending on the league and season.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#162
No description provided.