[GH-ISSUE #66] [WhoScored] Failure to parse date and time for schedule #11

Closed
opened 2026-03-02 15:55:00 +03:00 by kerem · 1 comment
Owner

Originally created by @giochi99 on GitHub (Jul 20, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/66

Which Python version are you using?

Python 3.10.5

Which version of soccerdata are you using?

1.0.3

What did you do?

ws = sd.WhoScored(leagues="ITA-Serie A", seasons='21-22', proxy='tor', headless=False)

seriea_2122_schedule = ws.read_schedule()
seriea_2122_schedule.head()

What did you expect to see?

Download schedule data

What did you see instead?

[07/20/22 21:39:51] INFO     Saving cached data to /home/giochi99/soccerdata/data/WhoScored           [_common.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/_common.py):[89](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/_common.py#89)

[07/20/22 21:39:52] INFO     patching driver executable                                              [patcher.py](file:///home/giochi99/.local/lib/python3.10/site-packages/undetected_chromedriver/patcher.py):[231](file:///home/giochi99/.local/lib/python3.10/site-packages/undetected_chromedriver/patcher.py#231)
                             /home/giochi99/.local/share/undetected_chromedriver/aa95ea2fc3bf32fc_chromedriver                                                                            

[07/20/22 21:41:36] INFO     Scraping game schedule from                                           [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[325](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#325)
                             https://www.whoscored.com/Regions/108/Tournaments/5/Seasons/8735/Stages/19982/Fixtures/Italy-Serie-A-2021-2022                                             

[07/20/22 21:41:42] INFO     Scraping game schedule for Sunday, May 1 2022                         [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239)

[07/20/22 21:41:43] INFO     Scraping game schedule for Monday, May 2 2022                         [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239)

                    INFO     Scraping game schedule for Thursday, May 5 2022                       [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239)

                    INFO     Scraping game schedule for Friday, May 6 2022                         [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239)

                    INFO     Scraping game schedule for Saturday, May 7 2022                       [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [12], in <cell line: 3>()
      1 ws = sd.WhoScored(leagues="ITA-Serie A", seasons='21-22', proxy='tor', headless=False)
----> 3 seriea_2122_schedule = ws.read_schedule()
      4 seriea_2122_schedule.head()

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:326, in WhoScored.read_schedule(self, force_cache)
    324         self._driver.get(url)
    325     logger.info("Scraping game schedule from %s", url)
--> 326     schedule.extend(self._parse_schedule())
    327 df_schedule = pd.DataFrame(schedule).assign(league=lkey, season=skey)
    328 if not self.no_store:

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:253, in WhoScored._parse_schedule(self, stage)
    251 schedule = []
    252 # Parse first page
--> 253 page_schedule, next_page = self._parse_schedule_page()
    254 schedule.extend(page_schedule)
    255 # Go to next page

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:213, in WhoScored._parse_schedule_page(self)
    209 if node.get_attribute("data-id"):
    210     time_str = node.find_element(By.XPATH, "./div[contains(@class,'time')]").text
    211     schedule_page.append(
    212         {
--> 213             "date": datetime.strptime(f"{date_str} {time_str}", "%A, %b %d %Y %H:%M"),
    214             "home_team": node.find_element(
    215                 By.XPATH, "./div[contains(@class,'team home')]//a"
    216             ).text,
    217             "away_team": node.find_element(
    218                 By.XPATH, "./div[contains(@class,'team away')]//a"
    219             ).text,
    220             # fmt: off
    221             "game_id": int(
    222                 re.search(
    223                     r"Matches/(\d+)/",
    224                     node.find_element(
    225                         By.XPATH,
    226                         "./div[contains(@class,'result')]//a"
    227                     ).get_attribute("href")).group(1)  # type: ignore
    228             ),
    229             # fmt: on
    230             "url": node.find_element(
    231                 By.XPATH, "./div[contains(@class,'result')]//a"
    232             ).get_attribute("href"),
    233         }
    234     )
    235 else:
    236     date_str = node.find_element(
    237         By.XPATH, "./div[contains(@class,'divtable-header')]"
    238     ).text

File /usr/lib/python3.10/_strptime.py:568, in _strptime_datetime(cls, data_string, format)
    565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)

File /usr/lib/python3.10/_strptime.py:349, in _strptime(data_string, format)
    347 found = format_regex.match(data_string)
    348 if not found:
--> 349     raise ValueError("time data %r does not match format %r" %
    350                      (data_string, format))
    351 if len(data_string) != found.end():
    352     raise ValueError("unconverted data remains: %s" %
    353                       data_string[found.end():])

ValueError: time data 'Saturday, May 7 2022 ' does not match format '%A, %b %d %Y %H:%M'

Once, this error happened also with Premier League, but after another attempt it disappeared. With Serie A it happens every time.

Originally created by @giochi99 on GitHub (Jul 20, 2022). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/66 Which Python version are you using? Python 3.10.5 Which version of soccerdata are you using? 1.0.3 What did you do? ws = sd.WhoScored(leagues="ITA-Serie A", seasons='21-22', proxy='tor', headless=False) seriea_2122_schedule = ws.read_schedule() seriea_2122_schedule.head() What did you expect to see? Download schedule data What did you see instead? [07/20/22 21:39:51] INFO Saving cached data to /home/giochi99/soccerdata/data/WhoScored [_common.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/_common.py):[89](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/_common.py#89) [07/20/22 21:39:52] INFO patching driver executable [patcher.py](file:///home/giochi99/.local/lib/python3.10/site-packages/undetected_chromedriver/patcher.py):[231](file:///home/giochi99/.local/lib/python3.10/site-packages/undetected_chromedriver/patcher.py#231) /home/giochi99/.local/share/undetected_chromedriver/aa95ea2fc3bf32fc_chromedriver [07/20/22 21:41:36] INFO Scraping game schedule from [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[325](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#325) https://www.whoscored.com/Regions/108/Tournaments/5/Seasons/8735/Stages/19982/Fixtures/Italy-Serie-A-2021-2022 [07/20/22 21:41:42] INFO Scraping game schedule for Sunday, May 1 2022 [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239) [07/20/22 21:41:43] INFO Scraping game schedule for Monday, May 2 2022 [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239) INFO Scraping game schedule for Thursday, May 5 2022 [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239) INFO Scraping game schedule for Friday, May 6 2022 [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239) INFO Scraping game schedule for Saturday, May 7 2022 [whoscored.py](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py):[239](file:///home/giochi99/.local/lib/python3.10/site-packages/soccerdata/whoscored.py#239) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [12], in <cell line: 3>() 1 ws = sd.WhoScored(leagues="ITA-Serie A", seasons='21-22', proxy='tor', headless=False) ----> 3 seriea_2122_schedule = ws.read_schedule() 4 seriea_2122_schedule.head() File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:326, in WhoScored.read_schedule(self, force_cache) 324 self._driver.get(url) 325 logger.info("Scraping game schedule from %s", url) --> 326 schedule.extend(self._parse_schedule()) 327 df_schedule = pd.DataFrame(schedule).assign(league=lkey, season=skey) 328 if not self.no_store: File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:253, in WhoScored._parse_schedule(self, stage) 251 schedule = [] 252 # Parse first page --> 253 page_schedule, next_page = self._parse_schedule_page() 254 schedule.extend(page_schedule) 255 # Go to next page File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:213, in WhoScored._parse_schedule_page(self) 209 if node.get_attribute("data-id"): 210 time_str = node.find_element(By.XPATH, "./div[contains(@class,'time')]").text 211 schedule_page.append( 212 { --> 213 "date": datetime.strptime(f"{date_str} {time_str}", "%A, %b %d %Y %H:%M"), 214 "home_team": node.find_element( 215 By.XPATH, "./div[contains(@class,'team home')]//a" 216 ).text, 217 "away_team": node.find_element( 218 By.XPATH, "./div[contains(@class,'team away')]//a" 219 ).text, 220 # fmt: off 221 "game_id": int( 222 re.search( 223 r"Matches/(\d+)/", 224 node.find_element( 225 By.XPATH, 226 "./div[contains(@class,'result')]//a" 227 ).get_attribute("href")).group(1) # type: ignore 228 ), 229 # fmt: on 230 "url": node.find_element( 231 By.XPATH, "./div[contains(@class,'result')]//a" 232 ).get_attribute("href"), 233 } 234 ) 235 else: 236 date_str = node.find_element( 237 By.XPATH, "./div[contains(@class,'divtable-header')]" 238 ).text File /usr/lib/python3.10/_strptime.py:568, in _strptime_datetime(cls, data_string, format) 565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"): 566 """Return a class cls instance based on the input string and the 567 format string.""" --> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format) 569 tzname, gmtoff = tt[-2:] 570 args = tt[:6] + (fraction,) File /usr/lib/python3.10/_strptime.py:349, in _strptime(data_string, format) 347 found = format_regex.match(data_string) 348 if not found: --> 349 raise ValueError("time data %r does not match format %r" % 350 (data_string, format)) 351 if len(data_string) != found.end(): 352 raise ValueError("unconverted data remains: %s" % 353 data_string[found.end():]) ValueError: time data 'Saturday, May 7 2022 ' does not match format '%A, %b %d %Y %H:%M' Once, this error happened also with Premier League, but after another attempt it disappeared. With Serie A it happens every time.
kerem closed this issue 2026-03-02 15:55:00 +03:00
Author
Owner

@probberechts commented on GitHub (Jul 22, 2022):

It bumped on a game without a time specified. However, it seems to work fine for me and the time_str seems to be present for each game. Could you dump the HTML of the page when this happens?

If you need a quick fix and do not mind that the time is incorrect you could specify a default time_str on line 211.

if not time_str:
    time_str = "20:00"
<!-- gh-comment-id:1192282080 --> @probberechts commented on GitHub (Jul 22, 2022): It bumped on a game without a time specified. However, it seems to work fine for me and the `time_str` seems to be present for each game. Could you dump the HTML of the page when this happens? If you need a quick fix and do not mind that the time is incorrect you could specify a default `time_str` on line 211. ```py if not time_str: time_str = "20:00" ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#11
No description provided.