mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-25 10:05:53 +03:00
[GH-ISSUE #596] [Whoscored] Broken read_schedule method #107
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#107
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @joaomcalves on GitHub (May 27, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/596
First of all congrats for this awesome repo!
I have been using whoscored scrapper without problems for the last few months. But in the last few days I have been having issues when scraping this year data.
For example if I run
ws = sd.WhoScored(leagues=""ENG-Premier League"", seasons=2223) epl_schedule = ws.read_schedule()It works well. But if I run:
ws = sd.WhoScored(leagues=""ENG-Premier League"", seasons=2324) epl_schedule = ws.read_schedule()I get this errror:
`TimeoutException Traceback (most recent call last)
Cell In[16], line 1
----> 1 epl_schedule = ws.read_schedule()
2 epl_schedule
File ~/Desktop/football/football_analytics/venv/lib/python3.9/site-packages/soccerdata/whoscored.py:390, in WhoScored.read_schedule(self, force_cache)
387 self._driver.get(url)
389 # Check if season consists of multiple stages
--> 390 stages = self._parse_season_stages()
392 # Handle a multi-stage season
393 if len(stages) > 0:
File ~/Desktop/football/football_analytics/venv/lib/python3.9/site-packages/soccerdata/whoscored.py:282, in WhoScored._parse_season_stages(self)
278 def _parse_season_stages(self) -> List[Dict]:
279 match_selector = (
280 "//div[contains(@id,'tournament-fixture')]//div[contains(@class,'divtable-row')]"
281 )
--> 282 WebDriverWait(self._driver, 30, poll_frequency=1).until(
283 ec.presence_of_element_located((By.XPATH, match_selector))
284 )
285 node_stages_selector = "//select[contains(@id,'stages')]/option"
286 node_stages = self._driver.find_elements(By.XPATH, node_stages_selector)
File ~/Desktop/football/football_analytics/venv/lib/python3.9/site-packages/selenium/webdriver/support/wait.py:105, in WebDriverWait.until(self, method, message)
103 if time.monotonic() > end_time:
104 break
--> 105 raise TimeoutException(message, screen, stacktrace)
TimeoutException: Message:
Stacktrace:
0 undetected_chromedriver 0x00000001008d66c8 undetected_chromedriver + 6149832
1 undetected_chromedriver 0x00000001008cdcea undetected_chromedriver + 6114538
2 undetected_chromedriver 0x000000010035ad5c undetected_chromedriver + 400732
3 undetected_chromedriver 0x00000001003a7aa5 undetected_chromedriver + 715429
4 undetected_chromedriver 0x00000001003a7bf1 undetected_chromedriver + 715761
5 undetected_chromedriver 0x00000001003ecdd4 undetected_chromedriver + 998868
6 undetected_chromedriver 0x00000001003cacdd undetected_chromedriver + 859357
7 undetected_chromedriver 0x00000001003ea0db undetected_chromedriver + 987355
8 undetected_chromedriver 0x00000001003caa53 undetected_chromedriver + 858707
9 undetected_chromedriver 0x000000010039a6d5 undetected_chromedriver + 661205
10 undetected_chromedriver 0x000000010039af6e undetected_chromedriver + 663406
11 undetected_chromedriver 0x0000000100897d00 undetected_chromedriver + 5893376
12 undetected_chromedriver 0x000000010089d4cc undetected_chromedriver + 5915852
13 undetected_chromedriver 0x00000001008798c4 undetected_chromedriver + 5769412
14 undetected_chromedriver 0x000000010089df99 undetected_chromedriver + 5918617
15 undetected_chromedriver 0x000000010086aed4 undetected_chromedriver + 5709524
16 undetected_chromedriver 0x00000001008be018 undetected_chromedriver + 6049816
17 undetected_chromedriver 0x00000001008be1d7 undetected_chromedriver + 6050263
18 undetected_chromedriver 0x00000001008cd89e undetected_chromedriver + 6113438
19 libsystem_pthread.dylib 0x00007ff80bb171d3 _pthread_start + 125
20 libsystem_pthread.dylib 0x00007ff80bb12bd3 thread_start + 15`
Any idea of how I can solve this issue?
Thanks!
@probberechts commented on GitHub (May 27, 2024):
Most likely this is related to #581. For previous seasons, the schedule is probably retrieved from the cache.
@joaomcalves commented on GitHub (May 27, 2024):
Oh Thanks @probberechts ! This was a fast response ahah I will test the new version.