[GH-ISSUE #59] [FBref] 403 error when downloading data #9

Closed
opened 2026-03-02 15:54:58 +03:00 by kerem · 9 comments
Owner

Originally created by @koenklomps on GitHub (Jul 5, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/59

Which Python version are you using?

Python 3.8.13

Which version of soccerdata are you using?

1.0.1

What did you do?

fbref = sd.FBref(leagues="NED-Eredivisie", seasons="2021-2022", proxy='tor')
team_season_stats = fbref.read_schedule()

What did you expect to see?

Downloaded team stats

What did you see instead?

requests.exceptions.HTTPError: 403
Client Error: Forbidden for url:
https://fbref.com/en/comps/

Originally created by @koenklomps on GitHub (Jul 5, 2022). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/59 Which Python version are you using? > Python 3.8.13 Which version of soccerdata are you using? >1.0.1 What did you do? >fbref = sd.FBref(leagues="NED-Eredivisie", seasons="2021-2022", proxy='tor') >team_season_stats = fbref.read_schedule() What did you expect to see? >Downloaded team stats What did you see instead? >requests.exceptions.HTTPError: 403 >Client Error: Forbidden for url: >https://fbref.com/en/comps/
kerem closed this issue 2026-03-02 15:54:58 +03:00
Author
Owner

@probberechts commented on GitHub (Jul 6, 2022):

Removing the "user-agent" header seems to fix it. You can remove the following line:

github.com/probberechts/soccerdata@50f6fef099/soccerdata/_common.py (L327)

However, I do not understand why this causes trouble.

<!-- gh-comment-id:1176278501 --> @probberechts commented on GitHub (Jul 6, 2022): Removing the "user-agent" header seems to fix it. You can remove the following line: https://github.com/probberechts/soccerdata/blob/50f6fef099761a9fca692dbebb96459fba8b393b/soccerdata/_common.py#L327 However, I do not understand why this causes trouble.
Author
Owner

@koenklomps commented on GitHub (Jul 6, 2022):

I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.

<!-- gh-comment-id:1176635222 --> @koenklomps commented on GitHub (Jul 6, 2022): I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.
Author
Owner

@frogman141 commented on GitHub (Jul 8, 2022):

One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds.

If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.

<!-- gh-comment-id:1179234969 --> @frogman141 commented on GitHub (Jul 8, 2022): One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds. If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.
Author
Owner

@probberechts commented on GitHub (Jul 8, 2022):

Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in fbref.py which overrides the default of "no rate limiting" in _common.py.

The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.

<!-- gh-comment-id:1179336498 --> @probberechts commented on GitHub (Jul 8, 2022): Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in `fbref.py` which overrides the default of "no rate limiting" in `_common.py`. The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.
Author
Owner

@frogman141 commented on GitHub (Jul 9, 2022):

Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.

<!-- gh-comment-id:1179511364 --> @frogman141 commented on GitHub (Jul 9, 2022): Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.
Author
Owner

@probberechts commented on GitHub (Jul 9, 2022):

Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.

About which error are you talking now? The 403 or 429 error?

Did you try removing the user agent headers?

<!-- gh-comment-id:1179515922 --> @probberechts commented on GitHub (Jul 9, 2022): > Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred. About which error are you talking now? The 403 or 429 error? Did you try removing the user agent headers?
Author
Owner

@frogman141 commented on GitHub (Jul 9, 2022):

So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.

<!-- gh-comment-id:1179516224 --> @frogman141 commented on GitHub (Jul 9, 2022): So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.
Author
Owner

@probberechts commented on GitHub (Jul 9, 2022):

No problem. Thanks for checking!

<!-- gh-comment-id:1179517039 --> @probberechts commented on GitHub (Jul 9, 2022): No problem. Thanks for checking!
Author
Owner

@probberechts commented on GitHub (Jul 10, 2022):

Should be fixed in v1.0.2 🚀

<!-- gh-comment-id:1179735485 --> @probberechts commented on GitHub (Jul 10, 2022): Should be fixed in v1.0.2 🚀
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#9
No description provided.