starred/soccerdata

Fork 0

mirror of https://github.com/probberechts/soccerdata.git synced 2026-04-26 02:25:51 +03:00

[GH-ISSUE #59] [FBref] 403 error when downloading data #9

New issue

Closed

opened 2026-03-02 15:54:58 +03:00 by kerem · 9 comments

kerem commented

2026-03-02 15:54:58 +03:00

Owner

Originally created by @koenklomps on GitHub (Jul 5, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/59

Which Python version are you using?

Python 3.8.13

Which version of soccerdata are you using?

1.0.1

What did you do?

fbref = sd.FBref(leagues="NED-Eredivisie", seasons="2021-2022", proxy='tor')
team_season_stats = fbref.read_schedule()

What did you expect to see?

Downloaded team stats

What did you see instead?

requests.exceptions.HTTPError: 403
Client Error: Forbidden for url:
https://fbref.com/en/comps/

Originally created by @koenklomps on GitHub (Jul 5, 2022). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/59 Which Python version are you using? > Python 3.8.13 Which version of soccerdata are you using? >1.0.1 What did you do? >fbref = sd.FBref(leagues="NED-Eredivisie", seasons="2021-2022", proxy='tor') >team_season_stats = fbref.read_schedule() What did you expect to see? >Downloaded team stats What did you see instead? >requests.exceptions.HTTPError: 403 >Client Error: Forbidden for url: >https://fbref.com/en/comps/

kerem closed this issue

2026-03-02 15:54:58 +03:00

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@probberechts commented on GitHub (Jul 6, 2022):

Removing the "user-agent" header seems to fix it. You can remove the following line:

github.com/probberechts/soccerdata@50f6fef099/soccerdata/_common.py (L327)

However, I do not understand why this causes trouble.

@probberechts commented on GitHub (Jul 6, 2022): Removing the "user-agent" header seems to fix it. You can remove the following line: https://github.com/probberechts/soccerdata/blob/50f6fef099761a9fca692dbebb96459fba8b393b/soccerdata/_common.py#L327 However, I do not understand why this causes trouble.

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@koenklomps commented on GitHub (Jul 6, 2022):

I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.

@koenklomps commented on GitHub (Jul 6, 2022): I tried deleting that line, but it still didn't work. However, after messing around a little bit more it started working, even with the user-agent line included. Seems to randomly work sometimes, but it other times it throws a 403 or 429 error.

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@frogman141 commented on GitHub (Jul 8, 2022):

One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds.

If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.

@frogman141 commented on GitHub (Jul 8, 2022): One potential cause of the issue is the new bot scrapping rules for FbRef. They've started to ban anyone scrapping the website at a rate faster than 1 request per 3 seconds. If you look into the _common.py code, you can see rate limit and max delay parameters are set to 0 and are currently inaccessible.

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@probberechts commented on GitHub (Jul 8, 2022):

Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in fbref.py which overrides the default of "no rate limiting" in _common.py.

The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.

@probberechts commented on GitHub (Jul 8, 2022): Indeed, you get a "429 Client Error: Too Many Requests for URL" error if you scrape too fast. Originally the rate limit was set to 1 request per 2 seconds, but it seems they've changed that now to 1 request per 3 seconds. This is actually implemented in `fbref.py` which overrides the default of "no rate limiting" in `_common.py`. The 403 error is a different issue and I am still convinced that it is caused by the user agent headers. I'll create a pull request in a few minutes and it would be great if you could check whether that solves your issues.

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@frogman141 commented on GitHub (Jul 9, 2022):

Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.

@frogman141 commented on GitHub (Jul 9, 2022): Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@probberechts commented on GitHub (Jul 9, 2022):

Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred.

About which error are you talking now? The 403 or 429 error?

Did you try removing the user agent headers?

@probberechts commented on GitHub (Jul 9, 2022): > Hey, quick update. I trained to change the rate_limit to 3 seconds or more, and unfortunately the same error occurred. About which error are you talking now? The 403 or 429 error? Did you try removing the user agent headers?

kerem commented

2026-03-02 15:54:59 +03:00

Author

Owner

@frogman141 commented on GitHub (Jul 9, 2022):

So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.

@frogman141 commented on GitHub (Jul 9, 2022): So the code works now. The quick update above was from me fiddling with the code. I just noticed your hotfix, tried it, and It works fine now. Sorry for the confusion.

kerem commented

2026-03-02 15:55:00 +03:00

Author

Owner

@probberechts commented on GitHub (Jul 9, 2022):

No problem. Thanks for checking!

@probberechts commented on GitHub (Jul 9, 2022): No problem. Thanks for checking!

kerem commented

2026-03-02 15:55:00 +03:00

Author

Owner

@probberechts commented on GitHub (Jul 10, 2022):