starred/soccerdata

Fork 0

mirror of https://github.com/probberechts/soccerdata.git synced 2026-04-26 02:25:51 +03:00

[GH-ISSUE #879] [FBref] 403 Forbidden error #187

New issue

Closed

opened 2026-03-02 15:56:31 +03:00 by kerem · 6 comments

kerem commented

2026-03-02 15:56:31 +03:00

Owner

Originally created by @MikeTrusky on GitHub (Aug 23, 2025).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/879

Describe the bug
A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden".

Affected scrapers
This affects the following scrapers:

FBref

Code example

import soccerdata as sd

fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True)

print(fbref.read_schedule())

Error message

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/

Originally created by @MikeTrusky on GitHub (Aug 23, 2025). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/879 **Describe the bug** A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden". **Affected scrapers** This affects the following scrapers: - [X] FBref **Code example** ```python import soccerdata as sd fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True) print(fbref.read_schedule()) ``` **Error message** ``` requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/ ```

kerem

2026-03-02 15:56:31 +03:00

closed this issue
added the
bug

FBref
labels

kerem commented

2026-03-02 15:56:32 +03:00

Author

Owner

@probberechts commented on GitHub (Aug 23, 2025):

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = {
    # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
    # "cache-control": "max-age=0",
    # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
    "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
    # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}

res = tls_requests.get(url, headers=headers)
assert res.status_code == 200

(what you send as the header's value does not seem to matter)

@probberechts commented on GitHub (Aug 23, 2025): It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me: ```py import tls_requests url = "https://fbref.com/en/comps/" headers = { # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", # "cache-control": "max-age=0", # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", } res = tls_requests.get(url, headers=headers) assert res.status_code == 200 ``` (what you send as the header's value does not seem to matter)

kerem commented

2026-03-02 15:56:32 +03:00

Author

Owner

@MikeTrusky commented on GitHub (Aug 23, 2025):

Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?

@MikeTrusky commented on GitHub (Aug 23, 2025): Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?

kerem commented

2026-03-02 15:56:32 +03:00

Author

Owner

@gustavoalikan1910 commented on GitHub (Aug 24, 2025):

Hello, I found a fix for the error:

I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class.

I gonna let the code here:

------------------------------------------------------

class BaseRequestsReader(BaseReader):
"""Base class for readers that use the Python requests module."""

def __init__(
    self,
    leagues: Optional[Union[str, list[str]]] = None,
    proxy: Optional[
        Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]]
    ] = None,
    no_cache: bool = False,
    no_store: bool = False,
    data_dir: Path = DATA_DIR,
):
    """Initialize the reader."""
    super().__init__(
        no_cache=no_cache,
        no_store=no_store,
        leagues=leagues,
        proxy=proxy,
        data_dir=data_dir,
    )

    self._session = self._init_session()

def _init_session(self) -> requests.Session:
    session = cloudscraper.create_scraper(
        browser={"browser": "chrome", "platform": "linux", "mobile": False}
    )
    session.proxies.update(self.proxy())
    return session

def _download_and_save(
    self,
    url: str,
    filepath: Optional[Path] = None,
    var: Optional[Union[str, Iterable[str]]] = None,
) -> IO[bytes]:
    """Download file at url to filepath. Overwrites if filepath exists."""
    headers = {
     "sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"'
     }
    for i in range(5):
        try:
            response = self._session.get(url, stream=True, headers=headers)
            #response = self._session.get(url, header=headers)
            time.sleep(self.rate_limit + random.random() * self.max_delay)
            response.raise_for_status()
            if var is not None:
                if isinstance(var, str):
                    var = [var]
                var_names = "|".join(var)
                template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)"
                pattern_understat = template_understat % bytes(var_names, encoding="utf-8")
                results = re.findall(pattern_understat, response.content)
                data = {
                    key.decode("unicode_escape"): json.loads(value.decode("unicode_escape"))
                    for key, value in results
                }
                payload = json.dumps(data).encode("utf-8")
            else:
                payload = response.content
            if not self.no_store and filepath is not None:
                with filepath.open(mode="wb") as fh:
                    fh.write(payload)
            return io.BytesIO(payload)
        except Exception:
            logger.exception(
                "Error while scraping %s. Retrying... (attempt %d of 5).",
                url,
                i + 1,
            )
            self._session = self._init_session()
            continue

    raise ConnectionError(f"Could not download {url}.")

Now, Fbref is working good!

@gustavoalikan1910 commented on GitHub (Aug 24, 2025): Hello, I found a fix for the error: I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class. I gonna let the code here: # ------------------------------------------------------ class BaseRequestsReader(BaseReader): """Base class for readers that use the Python requests module.""" def __init__( self, leagues: Optional[Union[str, list[str]]] = None, proxy: Optional[ Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]] ] = None, no_cache: bool = False, no_store: bool = False, data_dir: Path = DATA_DIR, ): """Initialize the reader.""" super().__init__( no_cache=no_cache, no_store=no_store, leagues=leagues, proxy=proxy, data_dir=data_dir, ) self._session = self._init_session() def _init_session(self) -> requests.Session: session = cloudscraper.create_scraper( browser={"browser": "chrome", "platform": "linux", "mobile": False} ) session.proxies.update(self.proxy()) return session def _download_and_save( self, url: str, filepath: Optional[Path] = None, var: Optional[Union[str, Iterable[str]]] = None, ) -> IO[bytes]: """Download file at url to filepath. Overwrites if filepath exists.""" headers = { "sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"' } for i in range(5): try: response = self._session.get(url, stream=True, headers=headers) #response = self._session.get(url, header=headers) time.sleep(self.rate_limit + random.random() * self.max_delay) response.raise_for_status() if var is not None: if isinstance(var, str): var = [var] var_names = "|".join(var) template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)" pattern_understat = template_understat % bytes(var_names, encoding="utf-8") results = re.findall(pattern_understat, response.content) data = { key.decode("unicode_escape"): json.loads(value.decode("unicode_escape")) for key, value in results } payload = json.dumps(data).encode("utf-8") else: payload = response.content if not self.no_store and filepath is not None: with filepath.open(mode="wb") as fh: fh.write(payload) return io.BytesIO(payload) except Exception: logger.exception( "Error while scraping %s. Retrying... (attempt %d of 5).", url, i + 1, ) self._session = self._init_session() continue raise ConnectionError(f"Could not download {url}.") Now, Fbref is working good!

kerem commented

2026-03-02 15:56:32 +03:00

Author

Owner

@ozzyman703 commented on GitHub (Aug 25, 2025):

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = {
# "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7",
# "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
# "cache-control": "max-age=0",
# "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
"sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
# "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}

res = tls_requests.get(url, headers=headers)
assert res.status_code == 200
(what you send as the header's value does not seem to matter)

I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated!

AssertionError Traceback (most recent call last)
/tmp/ipython-input-1561495402.py in <cell line: 0>()
13
14 res = tls_requests.get(url, headers=headers)
---> 15 assert res.status_code == 200

AssertionError:

@ozzyman703 commented on GitHub (Aug 25, 2025): > It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me: > > import tls_requests > > url = "https://fbref.com/en/comps/" > > headers = { > # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", > # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", > # "cache-control": "max-age=0", > # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", > "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', > # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", > } > > res = tls_requests.get(url, headers=headers) > assert res.status_code == 200 > (what you send as the header's value does not seem to matter) I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated! AssertionError Traceback (most recent call last) [/tmp/ipython-input-1561495402.py](https://localhost:8080/#) in <cell line: 0>() 13 14 res = tls_requests.get(url, headers=headers) ---> 15 assert res.status_code == 200 AssertionError:

kerem commented

2026-03-02 15:56:32 +03:00

Author

Owner

@mhd0528 commented on GitHub (Oct 4, 2025):

I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?

@mhd0528 commented on GitHub (Oct 4, 2025): I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?

kerem commented

2026-03-02 15:56:33 +03:00

Author

Owner

@keroloshany47 commented on GitHub (Dec 8, 2025):

i still face the same error for now does any one find a solution

@keroloshany47 commented on GitHub (Dec 8, 2025): i still face the same error for now does any one find a solution

kerem referenced this issue

2026-03-02 15:57:31 +03:00

[PR #187] [MERGED] Update dependency nbsphinx to v0.9.1 #346