mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 02:25:51 +03:00
[GH-ISSUE #310] [WhoScored] Error when running scraper with Tor #59
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#59
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @petmo on GitHub (Jul 27, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/310
Which Python version are you using: Python 3.11.4
Which version of soccerdata are you using? 1.4.0
Note: Actually copied and installed the full soccerdata poetry.lock file into my env so versions should be identical
Running on OSX Ventura
The issue:
Gives
Tor is installed and running according to documentation in a separate window:
Note that it works fine without the Tor proxy.
@probberechts commented on GitHub (Jul 27, 2023):
Can you try to run the code below and check what happens in your browser window? Does it say that your IP is blocked?
@petmo commented on GitHub (Jul 27, 2023):
Thanks for the quick help.
Tried that - but got the same error. Says nothing about IP being blocked, just the error msg above.
It opens a bunch of chrome windows, that seems to get a captcha prompt? Is this expected?
@probberechts commented on GitHub (Jul 27, 2023):
No, that's not expected. It looks like WhoScored has blacklisted the IP of your Tor exit node. You can try a different exit node (see https://stackoverflow.com/questions/1969958/how-to-change-the-tor-exit-node-programmatically-to-get-a-new-ip) or use a different proxy.
@probberechts commented on GitHub (Jul 27, 2023):
You could also try to solve the captcha once. Maybe you can continue scraping afterwards?
@hkzid commented on GitHub (Aug 9, 2023):
I've got similar problem. I try to run read_schedule(). It can get all the competition names. But then the html downloaded in \soccerdata\data\WhoScored\seasons is similar to this picture.
@OnlineAnalytics commented on GitHub (Aug 14, 2023):
Any fix for this yet? I'm seeing the same issues.
@hkzid commented on GitHub (Aug 15, 2023):
I think it's the problem of undetected-chromedriver and I don't think somebody find a way to solve it.