[GH-ISSUE #169] [WhoScored] The current season's schedule is not cached #41

New issue

Closed

opened 2026-03-02 15:55:16 +03:00 by kerem · 3 comments

kerem commented

2026-03-02 15:55:16 +03:00

Owner

Originally created by @guilherme-95 on GitHub (Mar 2, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/169

I'm facing the following issue when trying to scrape game data for the 22-23 season - when I ask ws.read_events to return data from a list of game IDs in the 21-22 season, it will scrape the schedule once and move on to getting the game data. If I do it for the 22-23 season, it starts scraping the schedule for every game_id I want data from.

It doesn't matter if I point to the schedule file in /soccerdata/data/WhoScored/matches or if I build a new file using ws.read_schedule

The code I'm running is below:

import pandas as pd
import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")

match_ids_df = pd.read_csv("premier_league_schedule.csv")

events = ws.read_events(match_id=match_id)
events = events.fillna(0)

filename = f"match_data_{match_id}.csv"
events.to_csv(filename, index=False)

I apologize if this is an error on my end, I'm not very experienced with python.

Originally created by @guilherme-95 on GitHub (Mar 2, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/169 I'm facing the following issue when trying to scrape game data for the 22-23 season - when I ask ws.read_events to return data from a list of game IDs in the 21-22 season, it will scrape the schedule once and move on to getting the game data. If I do it for the 22-23 season, it starts scraping the schedule for every game_id I want data from. It doesn't matter if I point to the schedule file in /soccerdata/data/WhoScored/matches or if I build a new file using ws.read_schedule The code I'm running is below: ```python import pandas as pd import soccerdata as sd ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23") match_ids_df = pd.read_csv("premier_league_schedule.csv") events = ws.read_events(match_id=match_id) events = events.fillna(0) filename = f"match_data_{match_id}.csv" events.to_csv(filename, index=False) ``` I apologize if this is an error on my end, I'm not very experienced with python.

kerem closed this issue

2026-03-02 15:55:16 +03:00

kerem commented

2026-03-02 15:55:17 +03:00

Author

Owner

@probberechts commented on GitHub (Mar 2, 2023):

This is supposed to be a feature. By default, the scraper assumes that the cache is outdated for the current season. If you are sure that the cache is up to date, you can force the scraper to use the cached schedule by setting the force_cache parameter to true.

import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")
events = ws.read_events(match_id=..., force_cache=True)

@probberechts commented on GitHub (Mar 2, 2023): This is supposed to be a feature. By default, the scraper assumes that the cache is outdated for the current season. If you are sure that the cache is up to date, you can force the scraper to use the cached schedule by setting the `force_cache` parameter to true. ```python import soccerdata as sd ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23") events = ws.read_events(match_id=..., force_cache=True) ```

kerem commented

2026-03-02 15:55:17 +03:00

Author

Owner

@guilherme-95 commented on GitHub (Mar 2, 2023):

thank you very much for the information

@guilherme-95 commented on GitHub (Mar 2, 2023): thank you very much for the information

kerem commented

2026-03-02 15:55:17 +03:00

Author

Owner

@guilherme-95 commented on GitHub (Mar 2, 2023):