[GH-ISSUE #366] [WhoScored] ConnectionError: Could not download https://www.whoscored.com.

kerem commented

2026-03-02 15:55:29 +03:00

Owner

Originally created by @ds-oliver on GitHub (Sep 13, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/366

I think the logs should have all necessary info to cite this issue.

Imports:
import tqdm from pathlib import Path import soccerdata as sd from socceraction.data.opta import OptaLoader import socceraction.spadl as spadl import pandas as pd import datetime import os import warnings import pickle import socceraction.atomic.spadl as atomicspadl import zipfile from io import BytesIO from urllib.request import urlretrieve

Code:
`# Initialize the WhoScored object
ws = sd.WhoScored(
leagues=["ENG-Premier League"],
seasons=2223,
headless=True
)

api = ws.read_events(output_fmt='loader')`

Traceback:

ConnectionError Traceback (most recent call last)
/Users/hogan/soccerdata/scrape.ipynb Cell 2 line 8
1 # Initialize the WhoScored object
2 ws = sd.WhoScored(
3 leagues=["ENG-Premier League"],
4 seasons=2223,
5 headless=True
6 )
----> 8 api = ws.read_events(output_fmt='loader')

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:667, in WhoScored.read_events(self, match_id, force_cache, live, output_fmt)
664 urlmask = WHOSCORED_URL + "/Matches/{}/Live"
665 filemask = "events/{}_{}/{}.json"
--> 667 df_schedule = self.read_schedule(force_cache).reset_index()
668 if match_id is not None:
669 iterator = df_schedule[
670 df_schedule.game_id.isin([match_id] if isinstance(match_id, int) else match_id)
671 ]

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:370, in WhoScored.read_schedule(self, force_cache)
357 def read_schedule(self, force_cache: bool = False) -> pd.DataFrame:
358 """Retrieve the game schedule for the selected leagues and seasons.
359
360 Parameters
(...)
368 pd.DataFrame
369 """
--> 370 df_seasons = self.read_seasons()
371 filemask = "matches/{}_{}.csv"
373 all_schedules = []

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:246, in WhoScored.read_seasons(self)
239 def read_seasons(self) -> pd.DataFrame:
240 """Retrieve the selected seasons for the selected leagues.
241
242 Returns
243 -------
244 pd.DataFrame
245 """
--> 246 df_leagues = self.read_leagues()
248 seasons = []
249 for lkey, league in df_leagues.iterrows():

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:212, in WhoScored.read_leagues(self)
210 url = WHOSCORED_URL
211 filepath = self.data_dir / "tiers.json"
--> 212 reader = self.get(url, filepath, var="allRegions")
214 data = json.load(reader)
216 leagues = []

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:132, in BaseReader.get(self, url, filepath, max_age, no_cache, var)
130 if no_cache or self.no_cache or not is_cached:
131 logger.debug("Scraping %s", url)
--> 132 return self._download_and_save(url, filepath, var)
133 logger.debug("Retrieving %s from cache", url)
134 assert filepath is not None

File ~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:452, in BaseSeleniumReader._download_and_save(self, url, filepath, var)
449 self._driver = self._init_webdriver()
450 continue
--> 452 raise ConnectionError("Could not download %s." % url)

ConnectionError: Could not download https://www.whoscored.com/.

Edit to add context/files:

Have since tried running scraper on top of Tor using ='Tor' and by defining proxies as dict.

https://github.com/probberechts/soccerdata/assets/77216918/13aafeb1-2e64-4dac-b115-0799c93e1afb

error.log

Originally created by @ds-oliver on GitHub (Sep 13, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/366 I think the logs should have all necessary info to cite this issue. Imports: `import tqdm from pathlib import Path import soccerdata as sd from socceraction.data.opta import OptaLoader import socceraction.spadl as spadl import pandas as pd import datetime import os import warnings import pickle import socceraction.atomic.spadl as atomicspadl import zipfile from io import BytesIO from urllib.request import urlretrieve ` Code: `# Initialize the WhoScored object ws = sd.WhoScored( leagues=["ENG-Premier League"], seasons=2223, headless=True ) api = ws.read_events(output_fmt='loader')` Traceback: ConnectionError Traceback (most recent call last) [/Users/hogan/soccerdata/scrape.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/scrape.ipynb) Cell 2 line 8 [1](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=0) # Initialize the WhoScored object [2](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=1) ws = sd.WhoScored( [3](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=2) leagues=["ENG-Premier League"], [4](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=3) seasons=2223, [5](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=4) headless=True [6](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=5) ) ----> [8](vscode-notebook-cell:/Users/hogan/soccerdata/scrape.ipynb#W1sZmlsZQ%3D%3D?line=7) api = ws.read_events(output_fmt='loader') File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:667](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:667), in WhoScored.read_events(self, match_id, force_cache, live, output_fmt) 664 urlmask = WHOSCORED_URL + "[/Matches/](https://file+.vscode-resource.vscode-cdn.net/Matches/){}[/Live](https://file+.vscode-resource.vscode-cdn.net/Live)" 665 filemask = "events/{}_{}[/](https://file+.vscode-resource.vscode-cdn.net/){}.json" --> 667 df_schedule = self.read_schedule(force_cache).reset_index() 668 if match_id is not None: 669 iterator = df_schedule[ 670 df_schedule.game_id.isin([match_id] if isinstance(match_id, int) else match_id) 671 ] File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:370](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:370), in WhoScored.read_schedule(self, force_cache) 357 def read_schedule(self, force_cache: bool = False) -> pd.DataFrame: 358 """Retrieve the game schedule for the selected leagues and seasons. 359 360 Parameters (...) 368 pd.DataFrame 369 """ --> 370 df_seasons = self.read_seasons() 371 filemask = "matches/{}_{}.csv" 373 all_schedules = [] File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:246](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:246), in WhoScored.read_seasons(self) 239 def read_seasons(self) -> pd.DataFrame: 240 """Retrieve the selected seasons for the selected leagues. 241 242 Returns 243 ------- 244 pd.DataFrame 245 """ --> 246 df_leagues = self.read_leagues() 248 seasons = [] 249 for lkey, league in df_leagues.iterrows(): File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:212](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/whoscored.py:212), in WhoScored.read_leagues(self) 210 url = WHOSCORED_URL 211 filepath = self.data_dir [/](https://file+.vscode-resource.vscode-cdn.net/) "tiers.json" --> 212 reader = self.get(url, filepath, var="allRegions") 214 data = json.load(reader) 216 leagues = [] File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:132](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:132), in BaseReader.get(self, url, filepath, max_age, no_cache, var) 130 if no_cache or self.no_cache or not is_cached: 131 logger.debug("Scraping %s", url) --> 132 return self._download_and_save(url, filepath, var) 133 logger.debug("Retrieving %s from cache", url) 134 assert filepath is not None File [~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:452](https://file+.vscode-resource.vscode-cdn.net/Users/hogan/soccerdata/~/soccerdata/scrape_env/lib/python3.9/site-packages/soccerdata/_common.py:452), in BaseSeleniumReader._download_and_save(self, url, filepath, var) 449 self._driver = self._init_webdriver() 450 continue --> 452 raise ConnectionError("Could not download %s." % url) ConnectionError: Could not download https://www.whoscored.com/. ### Edit to add context/files: #### **Have since tried running scraper on top of Tor using ='Tor' and by defining proxies as dict.** ![Screenshot by Dropbox Capture](https://github.com/probberechts/soccerdata/assets/77216918/3038017f-abcb-4071-8158-1adcea4f8ccf) https://github.com/probberechts/soccerdata/assets/77216918/13aafeb1-2e64-4dac-b115-0799c93e1afb [error.log](https://github.com/probberechts/soccerdata/files/12654492/error.log)

kerem closed this issue

2026-03-02 15:55:29 +03:00

kerem commented

2026-03-02 15:55:30 +03:00

Author

Owner

@OnlineAnalytics commented on GitHub (Sep 20, 2023):

Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping

@OnlineAnalytics commented on GitHub (Sep 20, 2023): Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping

kerem commented

2026-03-02 15:55:30 +03:00

Author

Owner

@aegonwolf commented on GitHub (Sep 27, 2023):

Hmm, I do get this now too.

@aegonwolf commented on GitHub (Sep 27, 2023): Hmm, I do get this now too.

kerem commented

2026-03-02 15:55:30 +03:00

Author

Owner

@aegonwolf commented on GitHub (Sep 27, 2023):

Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping

I think "they" is a single person and this is not necessarily a helpful comment, people have work, life and we enjoy an awesome free package that the author has spent a lot of time and effort building.

@aegonwolf commented on GitHub (Sep 27, 2023): > Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping I think "they" is a single person and this is not necessarily a helpful comment, people have work, life and we enjoy an awesome free package that the author has spent a lot of time and effort building.

kerem commented

2026-03-02 15:55:30 +03:00

Author

Owner

@OnlineAnalytics commented on GitHub (Sep 27, 2023):

Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping

I think "they" is a single person and this is not necessarily a helpful comment, people have work, life and we enjoy an awesome free package that the author has spent a lot of time and effort building.

I know it's a single person. Hence me using the singular pronoun. You don't really need to try and start drama where there isn't any.

@OnlineAnalytics commented on GitHub (Sep 27, 2023): > > Unfortunately it doesn’t look like they’ll do anything to try and fix it. Will need to find another means of scrapping > > I think "they" is a single person and this is not necessarily a helpful comment, people have work, life and we enjoy an awesome free package that the author has spent a lot of time and effort building. I know it's a single person. Hence me using the singular pronoun. You don't really need to try and start drama where there isn't any.

kerem commented

2026-03-02 15:55:31 +03:00

Author

Owner

@probberechts commented on GitHub (Sep 27, 2023):

I do not have this issue, so I am unable to fix it as I would have no way to verify it.

It looks like WhoScored does a security check. I do not know why it does it, but here are two options:

If you only see the "checking if the site connection is secure" window when you use soccerdata and not when manually browsing to the website, it might have detected that you are a bot. Then you might find some help/tips in the undetected-chromedriver repo on how to avoid detection. Also make sure to update the "undetected-chromedriver" dependency to its latests version.
It might be your IP address / location / network provider / ... that triggers the security check. Using a proxy or VPN might resolve it.

What does happen after the "verifying..."? Does it show a captcha? Or does it simply directly redirect to the WhoScored webpage? In that case, a straightforward solution could be to check whether the current page contains the text "checking if the site connection is secure" and wait until it redirects before progressing. You can add that after this line .

@probberechts commented on GitHub (Sep 27, 2023): I do not have this issue, so I am unable to fix it as I would have no way to verify it. It looks like WhoScored does a security check. I do not know why it does it, but here are two options: 1. If you only see the "checking if the site connection is secure" window when you use soccerdata and not when manually browsing to the website, it might have detected that you are a bot. Then you might find some help/tips in the [undetected-chromedriver](https://github.com/ultrafunkamsterdam/undetected-chromedriver) repo on how to avoid detection. Also make sure to update the "undetected-chromedriver" dependency to its latests version. 2. It might be your IP address / location / network provider / ... that triggers the security check. Using a proxy or VPN might resolve it. What does happen after the "verifying..."? Does it show a captcha? Or does it simply directly redirect to the WhoScored webpage? In that case, a straightforward solution could be to check whether the current page contains the text "checking if the site connection is secure" and wait until it redirects before progressing. You can add that [after this line](https://github.com/probberechts/soccerdata/blob/4cbeed21fe2c552b13d2aa36e05890da647f2250/soccerdata/_common.py#L427) .

kerem commented

2026-03-02 15:55:31 +03:00

Author

Owner

@ds-oliver commented on GitHub (Sep 27, 2023):

I do not have this issue, so I am unable to fix it as I would have no way to verify it.

It looks like WhoScored does a security check. I do not know why it does it, but here are two options:

If you only see the "checking if the site connection is secure" window when you use soccerdata and not when manually browsing to the website, it might have detected that you are a bot. Then you might find some help/tips in the undetected-chromedriver repo on how to avoid detection. Also make sure to update the "undetected-chromedriver" dependency to its latests version.

It might be your IP address / location / network provider / ... that triggers the security check. Using a proxy or VPN might resolve it.

What does happen after the "verifying..."? Does it show a captcha? Or does it simply directly redirect to the WhoScored webpage? In that case, a straightforward solution could be to check whether the current page contains the text "checking if the site connection is secure" and wait until it redirects before progressing.

Funny you should mention adding a wait period. I actually had already done so...

Any other suggestions?

@ds-oliver commented on GitHub (Sep 27, 2023): > I do not have this issue, so I am unable to fix it as I would have no way to verify it. > > It looks like WhoScored does a security check. I do not know why it does it, but here are two options: > > 1. If you only see the "checking if the site connection is secure" window when you use soccerdata and not when manually browsing to the website, it might have detected that you are a bot. Then you might find some help/tips in the [undetected-chromedriver](https://github.com/ultrafunkamsterdam/undetected-chromedriver) repo on how to avoid detection. Also make sure to update the "undetected-chromedriver" dependency to its latests version. > 2. It might be your IP address / location / network provider / ... that triggers the security check. Using a proxy or VPN might resolve it. > > What does happen after the "verifying..."? Does it show a captcha? Or does it simply directly redirect to the WhoScored webpage? In that case, a straightforward solution could be to check whether the current page contains the text "checking if the site connection is secure" and wait until it redirects before progressing. Funny you should mention adding a wait period. I actually had already done so... ![Screenshot by Dropbox Capture](https://github.com/probberechts/soccerdata/assets/77216918/d3e52eab-917c-4eda-9025-ec91b8f75db0) Any other suggestions?

kerem commented

2026-03-02 15:55:31 +03:00

Author

Owner

@TimelessUsername commented on GitHub (Sep 29, 2023):

Running headless false (while on selenium 4.12 or under) does the trick

@TimelessUsername commented on GitHub (Sep 29, 2023): Running headless false (while on selenium 4.12 or under) does the trick

kerem commented

2026-03-02 15:55:31 +03:00

Author

Owner

@ds-oliver commented on GitHub (Oct 1, 2023):

@TimelessUsername @probberechts

Running headless false (while on selenium 4.12 or under) does the trick

This has solved the issue. You have been a huge help @TimelessUsername.

@OnlineAnalytics I'm tagging you so that you can see the resolution, and hoping that you can witness how this is the way that most issues are resolved when it comes to open-source projects as this one. The project relies on the collective community to resolve complicated issues, not just the author, this is how the technology improves and now that we have found a workaround @probberechts can spend his valuable time patching instead of testing.

Thanks all. Closing this now. :)

@ds-oliver commented on GitHub (Oct 1, 2023): @TimelessUsername @probberechts > Running headless false (while on selenium 4.12 or under) does the trick This has solved the issue. You have been a huge help @TimelessUsername. @OnlineAnalytics I'm tagging you so that you can see the resolution, and hoping that you can witness how this is the way that most issues are resolved when it comes to open-source projects as this one. The project relies on the collective community to resolve complicated issues, not just the author, this is how the technology improves and now that we have found a workaround @probberechts can spend his valuable time patching instead of testing. Thanks all. Closing this now. :)

kerem referenced this issue

2026-03-02 15:57:07 +03:00

[PR #69] [CLOSED] Big 5 league player stats for fbref #260

kerem referenced this issue

2026-03-02 15:57:09 +03:00

[PR #85] [MERGED] Faster scraping of player seasons stats - fbref. #269

Rows
Columns

[GH-ISSUE #366] [WhoScored] ConnectionError: Could not download https://www.whoscored.com. #69

Edit to add context/files:

Have since tried running scraper on top of Tor using ='Tor' and by defining proxies as dict.