[GH-ISSUE #328] [FBref] Scraper returns old season data #62

Closed
opened 2026-03-02 15:55:26 +03:00 by kerem · 2 comments
Owner

Originally created by @mhd0528 on GitHub (Aug 16, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/328

Hi,
I was trying to use the following commands to get data for this season (2023-24), but it actually gives me data from 1923-1924.

import soccerdata as sd
fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24')
fbref.read_schedule()
                                                                 			week  day  ... notes   game_id
league             			season game                                                ...
ENG-Premier League 2324   1923-08-25 Arsenal-Newcastle Utd            1  Sat  ...  <NA>  9b8e5a81
   				                          1923-08-25 Birmingham-Aston Villa           1  Sat  ...  <NA>  1a908e21
   				                          1923-08-25 Blackburn-Chelsea                1  Sat  ...  <NA>  feb28dde
   				                          1923-08-25 Cardiff City-Bolton              1  Sat  ...  <NA>  6cfbbf26
   				                          1923-08-25 Everton-Nott'ham Forest          1  Sat  ...  <NA>  8931066c
   				...                                                                 ...  ...  ...   ...       ...
   				                          1924-05-03 Huddersfield-Nott'ham Forest    42  Sat  ...  <NA>  0964ff7d
   				                          1924-05-03 Manchester City-West Ham        42  Sat  ...  <NA>  a0a50c50
   				                          1924-05-03 Notts County-Liverpool          42  Sat  ...  <NA>  dafa1982
   				                          1924-05-03 Tottenham-Burnley               42  Sat  ...  <NA>  47a88b95
   				                          1924-05-03 West Brom-Sheffield Utd         42  Sat  ...  <NA>  7fc5fb2b

I have also tried to remove the cache or disable the cache before calling read_schedule(), but it can't find the new data.
I think there might be some wrong with the season parsing/reading part.
Maybe just modify the following line/file would work:
github.com/probberechts/soccerdata@ec45682c8f/soccerdata/fbref.py (L118)

Thanks in advance!

Originally created by @mhd0528 on GitHub (Aug 16, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/328 Hi, I was trying to use the following commands to get data for this season (2023-24), but it actually gives me data from 1923-1924. ``` import soccerdata as sd fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24') fbref.read_schedule() ``` ``` week day ... notes game_id league season game ... ENG-Premier League 2324 1923-08-25 Arsenal-Newcastle Utd 1 Sat ... <NA> 9b8e5a81 1923-08-25 Birmingham-Aston Villa 1 Sat ... <NA> 1a908e21 1923-08-25 Blackburn-Chelsea 1 Sat ... <NA> feb28dde 1923-08-25 Cardiff City-Bolton 1 Sat ... <NA> 6cfbbf26 1923-08-25 Everton-Nott'ham Forest 1 Sat ... <NA> 8931066c ... ... ... ... ... ... 1924-05-03 Huddersfield-Nott'ham Forest 42 Sat ... <NA> 0964ff7d 1924-05-03 Manchester City-West Ham 42 Sat ... <NA> a0a50c50 1924-05-03 Notts County-Liverpool 42 Sat ... <NA> dafa1982 1924-05-03 Tottenham-Burnley 42 Sat ... <NA> 47a88b95 1924-05-03 West Brom-Sheffield Utd 42 Sat ... <NA> 7fc5fb2b ``` I have also tried to remove the cache or disable the cache before calling read_schedule(), but it can't find the new data. I think there might be some wrong with the season parsing/reading part. Maybe just modify the following line/file would work: https://github.com/probberechts/soccerdata/blob/ec45682c8f7b75ed1166a0c6cc03a1001ff93991/soccerdata/fbref.py#L118 Thanks in advance!
kerem 2026-03-02 15:55:26 +03:00
  • closed this issue
  • added the
    FBref
    label
Author
Owner

@probberechts commented on GitHub (Aug 21, 2023):

This is probably related to #97 and can be solved by invalidating the cache:

import soccerdata as sd
fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24', no_cache=True)
fbref.read_schedule()
<!-- gh-comment-id:1686020110 --> @probberechts commented on GitHub (Aug 21, 2023): This is probably related to #97 and can be solved by invalidating the cache: ```py import soccerdata as sd fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24', no_cache=True) fbref.read_schedule() ```
Author
Owner

@mhd0528 commented on GitHub (Aug 25, 2023):

Hi,
Yes, that totally solves the issue! Thanks so much for helping!

<!-- gh-comment-id:1693962749 --> @mhd0528 commented on GitHub (Aug 25, 2023): Hi, Yes, that totally solves the issue! Thanks so much for helping!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#62
No description provided.