[GH-ISSUE #169] [WhoScored] The current season's schedule is not cached #41

Closed
opened 2026-03-02 15:55:16 +03:00 by kerem · 3 comments
Owner

Originally created by @guilherme-95 on GitHub (Mar 2, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/169

I'm facing the following issue when trying to scrape game data for the 22-23 season - when I ask ws.read_events to return data from a list of game IDs in the 21-22 season, it will scrape the schedule once and move on to getting the game data. If I do it for the 22-23 season, it starts scraping the schedule for every game_id I want data from.

It doesn't matter if I point to the schedule file in /soccerdata/data/WhoScored/matches or if I build a new file using ws.read_schedule

The code I'm running is below:

import pandas as pd
import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")

match_ids_df = pd.read_csv("premier_league_schedule.csv")

events = ws.read_events(match_id=match_id)
events = events.fillna(0)

filename = f"match_data_{match_id}.csv"
events.to_csv(filename, index=False)

I apologize if this is an error on my end, I'm not very experienced with python.

Originally created by @guilherme-95 on GitHub (Mar 2, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/169 I'm facing the following issue when trying to scrape game data for the 22-23 season - when I ask ws.read_events to return data from a list of game IDs in the 21-22 season, it will scrape the schedule once and move on to getting the game data. If I do it for the 22-23 season, it starts scraping the schedule for every game_id I want data from. It doesn't matter if I point to the schedule file in /soccerdata/data/WhoScored/matches or if I build a new file using ws.read_schedule The code I'm running is below: ```python import pandas as pd import soccerdata as sd ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23") match_ids_df = pd.read_csv("premier_league_schedule.csv") events = ws.read_events(match_id=match_id) events = events.fillna(0) filename = f"match_data_{match_id}.csv" events.to_csv(filename, index=False) ``` I apologize if this is an error on my end, I'm not very experienced with python.
kerem closed this issue 2026-03-02 15:55:16 +03:00
Author
Owner

@probberechts commented on GitHub (Mar 2, 2023):

This is supposed to be a feature. By default, the scraper assumes that the cache is outdated for the current season. If you are sure that the cache is up to date, you can force the scraper to use the cached schedule by setting the force_cache parameter to true.

import soccerdata as sd

ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23")
events = ws.read_events(match_id=..., force_cache=True)
<!-- gh-comment-id:1452189868 --> @probberechts commented on GitHub (Mar 2, 2023): This is supposed to be a feature. By default, the scraper assumes that the cache is outdated for the current season. If you are sure that the cache is up to date, you can force the scraper to use the cached schedule by setting the `force_cache` parameter to true. ```python import soccerdata as sd ws = sd.WhoScored(leagues="ENG-Premier League", seasons="22-23") events = ws.read_events(match_id=..., force_cache=True) ```
Author
Owner

@guilherme-95 commented on GitHub (Mar 2, 2023):

thank you very much for the information

<!-- gh-comment-id:1452282800 --> @guilherme-95 commented on GitHub (Mar 2, 2023): thank you very much for the information
Author
Owner

@guilherme-95 commented on GitHub (Mar 2, 2023):

Sorry, forgot to close the issue.

<!-- gh-comment-id:1452283904 --> @guilherme-95 commented on GitHub (Mar 2, 2023): Sorry, forgot to close the issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#41
No description provided.