[GH-ISSUE #698] Problem with Whoscored scraper #146

Closed
opened 2026-03-02 15:56:11 +03:00 by kerem · 1 comment
Owner

Originally created by @Oktay7v2 on GitHub (Sep 5, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/698

Describe the bug
When i'm trying to run the simple line of code sd.Whoscored(league etc.) it gaves problem with the translation of the leagues names, i've tried to make some changes in the config.py file, and it seem to work, but when i run every other line of code as the one for read the schedules or for the api the chromedriver goes on whoscored and does nothing. I think the problem can regard the language of the site, when the chromedriver opens whoscored it goes directly in italian and maybe it blocks the scraping method but im not sure about it.

Affected scrapers
WhoScored

Code example

import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021, no_cache=True)

epl_schedule = ws.read_schedule()
epl_schedule.head()

Error message

KeyError                                  Traceback (most recent call last)
Cell In[20], [line 6](vscode-notebook-cell:?execution_count=20&line=6)
      [3](vscode-notebook-cell:?execution_count=20&line=3) ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021, no_cache=True)
      [4](vscode-notebook-cell:?execution_count=20&line=4) print(ws.__doc__)
----> [6](vscode-notebook-cell:?execution_count=20&line=6) epl_schedule = ws.read_schedule()
      [7](vscode-notebook-cell:?execution_count=20&line=7) epl_schedule.head()

File c:\Users\oktay\AppData\Local\Programs\Python\Python310\lib\site-packages\soccerdata\whoscored.py:344, in WhoScored.read_schedule(self, force_cache)
    [331](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:331) def read_schedule(self, force_cache: bool = False) -> pd.DataFrame:
    [332](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:332)     """Retrieve the game schedule for the selected leagues and seasons.
    [333](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:333) 
    [334](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:334)     Parameters
   (...)
    [342](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:342)     pd.DataFrame
    [343](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:343)     """
--> [344](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:344)     df_season_stages = self.read_season_stages(force_cache=force_cache)
    [345](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:345)     filemask_schedule = "matches/{}_{}_{}_{}.json"
    [347](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:347)     all_schedules = []

File c:\Users\oktay\AppData\Local\Programs\Python\Python310\lib\site-packages\soccerdata\whoscored.py:274, in WhoScored.read_season_stages(self, force_cache)
    [261](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:261) def read_season_stages(self, force_cache: bool = False) -> pd.DataFrame:
    [262](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:262)     """Retrieve the season stages for the selected leagues.
    [263](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:263) 
    [264](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:264)     Parameters
...
-> [6249](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6249)         raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   [6251](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6251)     not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
   [6252](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6252)     raise KeyError(f"{not_found} not in index")

KeyError: "None of [Index(['ENG-Premier League'], dtype='object', name='league')] are in the [index]"

Contributor Action Plan
I’m not able to fix this issue.

Originally created by @Oktay7v2 on GitHub (Sep 5, 2024). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/698 **Describe the bug** When i'm trying to run the simple line of code sd.Whoscored(league etc.) it gaves problem with the translation of the leagues names, i've tried to make some changes in the config.py file, and it seem to work, but when i run every other line of code as the one for read the schedules or for the api the chromedriver goes on whoscored and does nothing. I think the problem can regard the language of the site, when the chromedriver opens whoscored it goes directly in italian and maybe it blocks the scraping method but im not sure about it. **Affected scrapers** WhoScored **Code example** ```python import soccerdata as sd ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021, no_cache=True) epl_schedule = ws.read_schedule() epl_schedule.head() ``` **Error message** ``` KeyError Traceback (most recent call last) Cell In[20], [line 6](vscode-notebook-cell:?execution_count=20&line=6) [3](vscode-notebook-cell:?execution_count=20&line=3) ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021, no_cache=True) [4](vscode-notebook-cell:?execution_count=20&line=4) print(ws.__doc__) ----> [6](vscode-notebook-cell:?execution_count=20&line=6) epl_schedule = ws.read_schedule() [7](vscode-notebook-cell:?execution_count=20&line=7) epl_schedule.head() File c:\Users\oktay\AppData\Local\Programs\Python\Python310\lib\site-packages\soccerdata\whoscored.py:344, in WhoScored.read_schedule(self, force_cache) [331](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:331) def read_schedule(self, force_cache: bool = False) -> pd.DataFrame: [332](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:332) """Retrieve the game schedule for the selected leagues and seasons. [333](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:333) [334](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:334) Parameters (...) [342](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:342) pd.DataFrame [343](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:343) """ --> [344](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:344) df_season_stages = self.read_season_stages(force_cache=force_cache) [345](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:345) filemask_schedule = "matches/{}_{}_{}_{}.json" [347](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:347) all_schedules = [] File c:\Users\oktay\AppData\Local\Programs\Python\Python310\lib\site-packages\soccerdata\whoscored.py:274, in WhoScored.read_season_stages(self, force_cache) [261](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:261) def read_season_stages(self, force_cache: bool = False) -> pd.DataFrame: [262](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:262) """Retrieve the season stages for the selected leagues. [263](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:263) [264](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/soccerdata/whoscored.py:264) Parameters ... -> [6249](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6249) raise KeyError(f"None of [{key}] are in the [{axis_name}]") [6251](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6251) not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique()) [6252](file:///C:/Users/oktay/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/indexes/base.py:6252) raise KeyError(f"{not_found} not in index") KeyError: "None of [Index(['ENG-Premier League'], dtype='object', name='league')] are in the [index]" ``` **Contributor Action Plan** I’m not able to fix this issue.
kerem 2026-03-02 15:56:11 +03:00
Author
Owner

@probberechts commented on GitHub (Sep 5, 2024):

This is a duplicate of #440. In #660, @Messe57 suggested that the following works: https://github.com/probberechts/soccerdata/issues/440#issuecomment-1857240870

<!-- gh-comment-id:2330704768 --> @probberechts commented on GitHub (Sep 5, 2024): This is a duplicate of #440. In #660, @Messe57 suggested that the following works: https://github.com/probberechts/soccerdata/issues/440#issuecomment-1857240870
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#146
No description provided.