[GH-ISSUE #916] [fbref] Error fetching schedule on 1.8.8 #201

Open
opened 2026-03-02 15:56:37 +03:00 by kerem · 15 comments
Owner

Originally created by @SportlyxLabs on GitHub (Jan 18, 2026).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/916

Describe the bug
A call to fbref schedule fails with 403 error , even with 1.8.8 version

Affected scrapers
This affects the following scrapers:

  • ClubElo
  • ESPN
  • FBref
  • FiveThirtyEight
  • FotMob
  • Match History
  • SoFIFA
  • Understat
  • WhoScored

Code example
A minimal code example that fails. Use no_cache=True to make sure an invalid cached file does not cause the bug and make sure you have the latest version of soccerdata installed.

import soccerdata as sd
print("Initializing FBref session with caching...")
fbref = sd.FBref(
    leagues="ENG-Premier League",
    seasons=2526,
    no_cache=False,
    no_store=False
)

print("Fetching EPL schedule...⬇️")
schedule = fbref.read_schedule()
schedule_df = schedule.reset_index()

Error message

⏩⏱️ FBref fetch starting...
Initializing FBref session with caching...
                    INFO     Saving cached data to                              _common.py:249
                             /Users/joe/soccerdata/data/FBref                        
[2026-01-18 21:36:44] INFO     TLSLibrary:_load_library:401 - Successfully loaded TLS library: /Users/joe/Documents/GitHub/fplstatshub/.venv/lib/python3.12/site-packages/tls_requests/bin/tls-client-darwin-arm64-1.13.1.dylib
                    INFO     Successfully loaded TLS library:                 libraries.py:401
                             /Users/joe/Documents/GitHub/fplstatshub                 
                             /.venv/lib/python3.12/site-packages/tls_requests                 
                             /bin/tls-client-darwin-arm64-1.13.1.dylib                        
Fetching EPL schedule...⬇️
[01/18/26 21:36:52] ERROR    Error while scraping https://fbref.com/en/comps/.  _common.py:526
                             Retrying... (attempt 1 of 5).                                    
                             Traceback (most recent call last):                               
                               File                                                           
                             "/Users/joe/Documents/GitHub/fplstatshub/               
                             .venv/lib/python3.12/site-packages/soccerdata/_com               
                             mon.py", line 506, in _download_and_save                         
                                 response.raise_for_status()                                  
                               File                                                           
                             "/Users/joe/Documents/GitHub/fplstatshub/               
                             .venv/lib/python3.12/site-packages/tls_requests/mo               
                             dels/response.py", line 194, in raise_for_status                 
                                 raise HTTPError(                                             
                             tls_requests.exceptions.HTTPError: 403 Client                    
                             Error: Forbidden for url:                                        
                             https://fbref.com/en/comps/               

Additional context
Add any other context about the problem here.

Contributor Action Plan

  • I can fix this issue and will submit a pull request.
  • I’m unsure how to fix this, but I'm willing to work on it with guidance.
  • I’m not able to fix this issue.
Originally created by @SportlyxLabs on GitHub (Jan 18, 2026). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/916 **Describe the bug** A call to fbref schedule fails with 403 error , even with 1.8.8 version **Affected scrapers** This affects the following scrapers: - [ ] ClubElo - [ ] ESPN - [x] FBref - [ ] FiveThirtyEight - [ ] FotMob - [ ] Match History - [ ] SoFIFA - [ ] Understat - [ ] WhoScored **Code example** A minimal code example that fails. Use `no_cache=True` to make sure an invalid cached file does not cause the bug and make sure you have the latest version of soccerdata installed. ```python import soccerdata as sd print("Initializing FBref session with caching...") fbref = sd.FBref( leagues="ENG-Premier League", seasons=2526, no_cache=False, no_store=False ) print("Fetching EPL schedule...⬇️") schedule = fbref.read_schedule() schedule_df = schedule.reset_index() ``` **Error message** ``` ⏩⏱️ FBref fetch starting... Initializing FBref session with caching... INFO Saving cached data to _common.py:249 /Users/joe/soccerdata/data/FBref [2026-01-18 21:36:44] INFO TLSLibrary:_load_library:401 - Successfully loaded TLS library: /Users/joe/Documents/GitHub/fplstatshub/.venv/lib/python3.12/site-packages/tls_requests/bin/tls-client-darwin-arm64-1.13.1.dylib INFO Successfully loaded TLS library: libraries.py:401 /Users/joe/Documents/GitHub/fplstatshub /.venv/lib/python3.12/site-packages/tls_requests /bin/tls-client-darwin-arm64-1.13.1.dylib Fetching EPL schedule...⬇️ [01/18/26 21:36:52] ERROR Error while scraping https://fbref.com/en/comps/. _common.py:526 Retrying... (attempt 1 of 5). Traceback (most recent call last): File "/Users/joe/Documents/GitHub/fplstatshub/ .venv/lib/python3.12/site-packages/soccerdata/_com mon.py", line 506, in _download_and_save response.raise_for_status() File "/Users/joe/Documents/GitHub/fplstatshub/ .venv/lib/python3.12/site-packages/tls_requests/mo dels/response.py", line 194, in raise_for_status raise HTTPError( tls_requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/ ``` **Additional context** Add any other context about the problem here. **Contributor Action Plan** - [ ] I can fix this issue and will submit a pull request. - [ ] I’m unsure how to fix this, but I'm willing to work on it with guidance. - [ ] I’m not able to fix this issue.
Author
Owner

@Messe57 commented on GitHub (Jan 19, 2026):

I am facing the same issues. I thought that it was my fault, but it seems that is not the case.

<!-- gh-comment-id:3769413320 --> @Messe57 commented on GitHub (Jan 19, 2026): I am facing the same issues. I thought that it was my fault, but it seems that is not the case.
Author
Owner

@zshott commented on GitHub (Jan 19, 2026):

I was able to use FBref scraper last week, but now I am also getting 403

Also, your name is in the paths in the error message, maybe you should remove that.

<!-- gh-comment-id:3770087579 --> @zshott commented on GitHub (Jan 19, 2026): I was able to use FBref scraper last week, but now I am also getting 403 Also, your name is in the paths in the error message, maybe you should remove that.
Author
Owner

@saedstudent commented on GitHub (Jan 19, 2026):

same here

<!-- gh-comment-id:3770202696 --> @saedstudent commented on GitHub (Jan 19, 2026): same here
Author
Owner

@spanalytic commented on GitHub (Jan 20, 2026):

Got the same problem is there any fix?

<!-- gh-comment-id:3772503743 --> @spanalytic commented on GitHub (Jan 20, 2026): Got the same problem is there any fix?
Author
Owner

@probberechts commented on GitHub (Jan 20, 2026):

It looks like FBref has recently strengthened its anti-bot protection (they are behind Cloudflare).

I haven’t had the chance to investigate this in depth yet, and I can’t give a timeline for when I'll be able to do so.

As a temporary workaround, one possible solution might be to:

  1. Visit FBref in your regular browser.
  2. Extract the cookies set by Cloudflare.
  3. Inject those cookies into the request headers used by the scraper.
<!-- gh-comment-id:3772835341 --> @probberechts commented on GitHub (Jan 20, 2026): It looks like FBref has recently strengthened its anti-bot protection (they are behind Cloudflare). I haven’t had the chance to investigate this in depth yet, and I can’t give a timeline for when I'll be able to do so. As a temporary workaround, one possible solution might be to: 1. Visit FBref in your regular browser. 2. Extract the cookies set by Cloudflare. 3. Inject those cookies into the request headers used by the scraper.
Author
Owner

@SportlyxLabs commented on GitHub (Jan 21, 2026):

Looks like fbref advanced data is gone
https://www.sports-reference.com/blog/2026/01/fbref-stathead-data-update/

<!-- gh-comment-id:3775704627 --> @SportlyxLabs commented on GitHub (Jan 21, 2026): Looks like fbref advanced data is gone https://www.sports-reference.com/blog/2026/01/fbref-stathead-data-update/
Author
Owner

@lvlun0532-spec commented on GitHub (Jan 21, 2026):

看来 fbref 的高级数据已经没了 https://www.sports-reference.com/blog/2026/01/fbref-stathead-data-update/

Bro, do you know of any other websites where I can view advanced data?

<!-- gh-comment-id:3780575908 --> @lvlun0532-spec commented on GitHub (Jan 21, 2026): > 看来 fbref 的高级数据已经没了 https://www.sports-reference.com/blog/2026/01/fbref-stathead-data-update/ Bro, do you know of any other websites where I can view advanced data?
Author
Owner

@lvlun0532-spec commented on GitHub (Jan 21, 2026):

Hey bro, do you know any other websites where I can view advanced data? Like, the kind that updates weekly.

<!-- gh-comment-id:3781409767 --> @lvlun0532-spec commented on GitHub (Jan 21, 2026): Hey bro, do you know any other websites where I can view advanced data? Like, the kind that updates weekly.
Author
Owner

@dimitrismoustakas commented on GitHub (Jan 24, 2026):

It looks like FBref has recently strengthened its anti-bot protection (they are behind Cloudflare).

I haven’t had the chance to investigate this in depth yet, and I can’t give a timeline for when I'll be able to do so.

As a temporary workaround, one possible solution might be to:

  1. Visit FBref in your regular browser.
  2. Extract the cookies set by Cloudflare.
  3. Inject those cookies into the request headers used by the scraper.

The problem is indeed caused by this. There is an relatively easy fix (you just need to be careful to not break WhoScored in the process) but I'm not sure I should commit it because together with FbRef changing what data are available I'm not sure what (if any) tests failing are due to me not making the full proper solution and what tests are failing simply because the data are no longer available. If we only had the schedule problem I'd have committed it already (I only use the schedule method anyway in my project) but not I don't have the time to fully investigate all the issues.

<!-- gh-comment-id:3794568886 --> @dimitrismoustakas commented on GitHub (Jan 24, 2026): > It looks like FBref has recently strengthened its anti-bot protection (they are behind Cloudflare). > > I haven’t had the chance to investigate this in depth yet, and I can’t give a timeline for when I'll be able to do so. > > As a temporary workaround, one possible solution might be to: > > 1. Visit FBref in your regular browser. > 2. Extract the cookies set by Cloudflare. > 3. Inject those cookies into the request headers used by the scraper. The problem is indeed caused by this. There is an relatively easy fix (you just need to be careful to not break WhoScored in the process) but I'm not sure I should commit it because together with FbRef changing what data are available I'm not sure what (if any) tests failing are due to me not making the full proper solution and what tests are failing simply because the data are no longer available. If we only had the schedule problem I'd have committed it already (I only use the schedule method anyway in my project) but not I don't have the time to fully investigate all the issues.
Author
Owner

@lvlun0532-spec commented on GitHub (Jan 25, 2026):

Thank you. I think I've found an alternative, although FotMob's data can't
compare to FBRef's. However, it's sufficient for my model. Let's keep in
touch.

dimitrismoustakas @.***> 于2026年1月24日周六 20:36写道:

dimitrismoustakas left a comment (probberechts/soccerdata#916)
https://github.com/probberechts/soccerdata/issues/916#issuecomment-3794568886

It looks like FBref has recently strengthened its anti-bot protection
(they are behind Cloudflare).

I haven’t had the chance to investigate this in depth yet, and I can’t
give a timeline for when I'll be able to do so.

As a temporary workaround, one possible solution might be to:

  1. Visit FBref in your regular browser.
  2. Extract the cookies set by Cloudflare.
  3. Inject those cookies into the request headers used by the scraper.

The problem is indeed caused by this. There is an relatively easy fix (you
just need to be careful to not break WhoScored in the process) but I'm not
sure I should commit it because together with FbRef changing what data are
available I'm not sure what (if any) tests failing are due to me not making
the full proper solution and what tests are failing simply because the data
are no longer available. If we only had the schedule problem I'd have
committed it already (I only use the schedule method anyway in my project)
but not I don't have the time to fully investigate all the issues.


Reply to this email directly, view it on GitHub
https://github.com/probberechts/soccerdata/issues/916#issuecomment-3794568886,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/B5EI2QCEN57SOAZCKL7ANV34INRLHAVCNFSM6AAAAACSDAA672VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOOJUGU3DQOBYGY
.
You are receiving this because you commented.Message ID:
@.***>

<!-- gh-comment-id:3796509078 --> @lvlun0532-spec commented on GitHub (Jan 25, 2026): Thank you. I think I've found an alternative, although FotMob's data can't compare to FBRef's. However, it's sufficient for my model. Let's keep in touch. dimitrismoustakas ***@***.***> 于2026年1月24日周六 20:36写道: > *dimitrismoustakas* left a comment (probberechts/soccerdata#916) > <https://github.com/probberechts/soccerdata/issues/916#issuecomment-3794568886> > > It looks like FBref has recently strengthened its anti-bot protection > (they are behind Cloudflare). > > I haven’t had the chance to investigate this in depth yet, and I can’t > give a timeline for when I'll be able to do so. > > As a temporary workaround, one possible solution might be to: > > 1. Visit FBref in your regular browser. > 2. Extract the cookies set by Cloudflare. > 3. Inject those cookies into the request headers used by the scraper. > > The problem is indeed caused by this. There is an relatively easy fix (you > just need to be careful to not break WhoScored in the process) but I'm not > sure I should commit it because together with FbRef changing what data are > available I'm not sure what (if any) tests failing are due to me not making > the full proper solution and what tests are failing simply because the data > are no longer available. If we only had the schedule problem I'd have > committed it already (I only use the schedule method anyway in my project) > but not I don't have the time to fully investigate all the issues. > > — > Reply to this email directly, view it on GitHub > <https://github.com/probberechts/soccerdata/issues/916#issuecomment-3794568886>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/B5EI2QCEN57SOAZCKL7ANV34INRLHAVCNFSM6AAAAACSDAA672VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOOJUGU3DQOBYGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >
Author
Owner

@guilhermecxe commented on GitHub (Feb 8, 2026):

We can bypass Cloudflare with the BaseSeleniumReader using something like this:

driver = Driver(uc=True)
url = "https://fbref.com/en/comps/"
driver.uc_open_with_reconnect(url, 4)
driver.uc_gui_click_captcha()
time.sleep(10)
print(driver.page_source)

However, it uses PyAutoGUI internally and PyAutoGUI doesn't work with headless=True. I tried to implement this but I am stuck there, making it headless.

<!-- gh-comment-id:3867423509 --> @guilhermecxe commented on GitHub (Feb 8, 2026): We can bypass Cloudflare with the `BaseSeleniumReader` using something like this: ```py driver = Driver(uc=True) url = "https://fbref.com/en/comps/" driver.uc_open_with_reconnect(url, 4) driver.uc_gui_click_captcha() time.sleep(10) print(driver.page_source) ``` However, it uses PyAutoGUI internally and PyAutoGUI doesn't work with `headless=True`. I tried to implement this but I am stuck there, making it headless.
Author
Owner

@thomaSLBY commented on GitHub (Feb 9, 2026):

Hi @guilhermecxe, you can complete the Driver instanciation with headless=True, headless2=True to get the scraping faster and less detectable.

<!-- gh-comment-id:3873980644 --> @thomaSLBY commented on GitHub (Feb 9, 2026): Hi @guilhermecxe, you can complete the Driver instanciation with headless=True, headless2=True to get the scraping faster and less detectable.
Author
Owner

@guilhermecxe commented on GitHub (Feb 13, 2026):

Hi @guilhermecxe, you can complete the Driver instanciation with headless=True, headless2=True to get the scraping faster and less detectable.

@thomaSLBY, like said previously, PyAutoGUI (used by uc_gui_click_captcha) doesn't work with headless as True. Or am I missing something?

<!-- gh-comment-id:3896722198 --> @guilhermecxe commented on GitHub (Feb 13, 2026): > Hi [@guilhermecxe](https://github.com/guilhermecxe), you can complete the Driver instanciation with headless=True, headless2=True to get the scraping faster and less detectable. @thomaSLBY, like said previously, PyAutoGUI (used by `uc_gui_click_captcha`) doesn't work with headless as True. Or am I missing something?
Author
Owner

@paulz1 commented on GitHub (Feb 23, 2026):

Are there something new on this ?
Seems that fbref doesn't work even putting the cookies from Firefox. Am I right or may be I do something wrong ?
Are there some other workarounds ?

<!-- gh-comment-id:3947914698 --> @paulz1 commented on GitHub (Feb 23, 2026): Are there something new on this ? Seems that fbref doesn't work even putting the cookies from Firefox. Am I right or may be I do something wrong ? Are there some other workarounds ?
Author
Owner

@gustavoalikan1910 commented on GitHub (Feb 24, 2026):

@paulz1

Same issue here....

<!-- gh-comment-id:3954942962 --> @gustavoalikan1910 commented on GitHub (Feb 24, 2026): @paulz1 Same issue here....
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#201
No description provided.