mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 02:25:51 +03:00
[GH-ISSUE #59] [FBref] 403 error when downloading data #9
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#9
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @koenklomps on GitHub (Jul 5, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/59
Which Python version are you using?
Which version of soccerdata are you using?
What did you do?
What did you expect to see?
What did you see instead?
@probberechts commented on GitHub (Jul 6, 2022):
Removing the "user-agent" header seems to fix it. You can remove the following line:
github.com/probberechts/soccerdata@50f6fef099/soccerdata/_common.py (L327)However, I do not understand why this causes trouble.
@koenklomps commented on GitHub (Jul 6, 2022):
I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.
@frogman141 commented on GitHub (Jul 8, 2022):
One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds.
If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.
@probberechts commented on GitHub (Jul 8, 2022):
Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in
fbref.pywhich overrides the default of "no rate limiting" in_common.py.The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.
@frogman141 commented on GitHub (Jul 9, 2022):
Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.
@probberechts commented on GitHub (Jul 9, 2022):
About which error are you talking now? The 403 or 429 error?
Did you try removing the user agent headers?
@frogman141 commented on GitHub (Jul 9, 2022):
So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.
@probberechts commented on GitHub (Jul 9, 2022):
No problem. Thanks for checking!
@probberechts commented on GitHub (Jul 10, 2022):
Should be fixed in v1.0.2 🚀