[GH-ISSUE #726] FBRef - Returning stats for 1923/24 season not 2023/24 season #154

Closed
opened 2026-03-02 15:56:14 +03:00 by kerem · 1 comment
Owner

Originally created by @philbywalsh on GitHub (Oct 15, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/726

Describe the bug
I'm attempting to scrape FBRef for Premier League stats for the 2023/24 season. I've tried various values for 'season' and all seem to return stats for 1923/24 season not the 2023/24 season.

Affected scrapers
This affects the following scrapers:

  • ClubElo
  • ESPN
  • FBref
  • FiveThirtyEight
  • FotMob
  • Match History
  • SoFIFA
  • Understat
  • WhoScored

Code example

import soccerdata as sd

seasons = ['2023-24']

def fetch_match_IDs(seasons):
    """Fetches match IDs for a given list of seasons.

    Args:
        seasons (list): A list of seasons.

    Returns:
        pd.DataFrame: A DataFrame containing match IDs.
    """

    match_IDs = pd.DataFrame(columns=['season', 'week', 'home_team', 'away_team', 'score', 'game_id'])

    for season in seasons:
        fbref = sd.FBref(leagues="ENG-Premier League", seasons={season}, no_cache=True)
        print(f'Attempting to scrape season: {season}')
        schedule = fbref.read_schedule()
        schedule = schedule.reset_index(drop=True)

        # Create a new column 'season' and assign the current season value
        schedule['season'] = season

        match_data = schedule[['season', 'week', 'home_team', 'away_team', 'score', 'game_id']]
        match_IDs = pd.concat([match_IDs, match_data], ignore_index=True)
        
        time.sleep(random.randint(2, 6))
        #time.sleep(10)  # Wait for 10 seconds

    time.sleep(random.randint(2, 6))
    #time.sleep(10)  # Wait for 10 seconds

    return match_IDs

match_IDs = fetch_match_IDs(seasons)

Error message

No error message. But incorrect data is returned

Additional context
Oddly enough, the '2023-2024' parameter seemed to work fine during extensive testing yesterdat.

Contributor Action Plan

  • I can fix this issue and will submit a pull request.
  • I’m unsure how to fix this, but I'm willing to work on it with guidance.
  • I’m not able to fix this issue.
Originally created by @philbywalsh on GitHub (Oct 15, 2024). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/726 **Describe the bug** I'm attempting to scrape FBRef for Premier League stats for the 2023/24 season. I've tried various values for 'season' and all seem to return stats for 1923/24 season not the 2023/24 season. **Affected scrapers** This affects the following scrapers: - [ ] ClubElo - [ ] ESPN - [X] FBref - [ ] FiveThirtyEight - [ ] FotMob - [ ] Match History - [ ] SoFIFA - [ ] Understat - [ ] WhoScored **Code example** ```python import soccerdata as sd seasons = ['2023-24'] def fetch_match_IDs(seasons): """Fetches match IDs for a given list of seasons. Args: seasons (list): A list of seasons. Returns: pd.DataFrame: A DataFrame containing match IDs. """ match_IDs = pd.DataFrame(columns=['season', 'week', 'home_team', 'away_team', 'score', 'game_id']) for season in seasons: fbref = sd.FBref(leagues="ENG-Premier League", seasons={season}, no_cache=True) print(f'Attempting to scrape season: {season}') schedule = fbref.read_schedule() schedule = schedule.reset_index(drop=True) # Create a new column 'season' and assign the current season value schedule['season'] = season match_data = schedule[['season', 'week', 'home_team', 'away_team', 'score', 'game_id']] match_IDs = pd.concat([match_IDs, match_data], ignore_index=True) time.sleep(random.randint(2, 6)) #time.sleep(10) # Wait for 10 seconds time.sleep(random.randint(2, 6)) #time.sleep(10) # Wait for 10 seconds return match_IDs match_IDs = fetch_match_IDs(seasons) ``` **Error message** ``` No error message. But incorrect data is returned ``` **Additional context** Oddly enough, the '2023-2024' parameter seemed to work fine during extensive testing yesterdat. **Contributor Action Plan** - [ ] I can fix this issue and will submit a pull request. - [X] I’m unsure how to fix this, but I'm willing to work on it with guidance. - [ ] I’m not able to fix this issue.
kerem 2026-03-02 15:56:14 +03:00
Author
Owner

@philbywalsh commented on GitHub (Oct 16, 2024):

Bizarrely I re-ran this code this morning (same jupyter notebook, which remained open overnight) with seasons = ['2023-24'] and it now works as desired - i.e. brings back 2023/24 data not 1923/24 data.

Feels like this is an intermittent bug as, over the last 36 hours, the same code returned

2023/24
then 1923/24
when back to 2023/24

<!-- gh-comment-id:2416066260 --> @philbywalsh commented on GitHub (Oct 16, 2024): Bizarrely I re-ran this code this morning (same jupyter notebook, which remained open overnight) with seasons = ['2023-24'] and it now works as desired - i.e. brings back 2023/24 data not 1923/24 data. Feels like this is an intermittent bug as, over the last 36 hours, the same code returned 2023/24 then 1923/24 when back to 2023/24
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#154
No description provided.