[GH-ISSUE #879] [FBref] 403 Forbidden error #187

Closed
opened 2026-03-02 15:56:31 +03:00 by kerem · 6 comments
Owner

Originally created by @MikeTrusky on GitHub (Aug 23, 2025).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/879

Describe the bug
A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden".

Affected scrapers
This affects the following scrapers:

  • FBref

Code example

import soccerdata as sd

fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True)

print(fbref.read_schedule())

Error message

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/
Originally created by @MikeTrusky on GitHub (Aug 23, 2025). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/879 **Describe the bug** A call to sd.FBref results with "HTTPError: 403 Client Error: Forbidden". **Affected scrapers** This affects the following scrapers: - [X] FBref **Code example** ```python import soccerdata as sd fbref = sd.FBref(leagues='ENG-Premier League', seasons='24/25', no_cache=True) print(fbref.read_schedule()) ``` **Error message** ``` requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://fbref.com/en/comps/ ```
kerem 2026-03-02 15:56:31 +03:00
  • closed this issue
  • added the
    bug
    FBref
    labels
Author
Owner

@probberechts commented on GitHub (Aug 23, 2025):

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = {
    # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
    # "cache-control": "max-age=0",
    # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
    "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
    # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}

res = tls_requests.get(url, headers=headers)
assert res.status_code == 200

(what you send as the header's value does not seem to matter)

<!-- gh-comment-id:3217345181 --> @probberechts commented on GitHub (Aug 23, 2025): It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me: ```py import tls_requests url = "https://fbref.com/en/comps/" headers = { # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", # "cache-control": "max-age=0", # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", } res = tls_requests.get(url, headers=headers) assert res.status_code == 200 ``` (what you send as the header's value does not seem to matter)
Author
Owner

@MikeTrusky commented on GitHub (Aug 23, 2025):

Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?

<!-- gh-comment-id:3217454921 --> @MikeTrusky commented on GitHub (Aug 23, 2025): Great, I can confirm that without "sec-ch-ua" AssertionError is called, but with this header there is no assert call. So it seems to work. But, is it possible to add this header using soccerdata or there has to be change in fbref.py?
Author
Owner

@gustavoalikan1910 commented on GitHub (Aug 24, 2025):

Hello, I found a fix for the error:

I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class.

I gonna let the code here:

------------------------------------------------------

class BaseRequestsReader(BaseReader):
"""Base class for readers that use the Python requests module."""

def __init__(
    self,
    leagues: Optional[Union[str, list[str]]] = None,
    proxy: Optional[
        Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]]
    ] = None,
    no_cache: bool = False,
    no_store: bool = False,
    data_dir: Path = DATA_DIR,
):
    """Initialize the reader."""
    super().__init__(
        no_cache=no_cache,
        no_store=no_store,
        leagues=leagues,
        proxy=proxy,
        data_dir=data_dir,
    )

    self._session = self._init_session()

def _init_session(self) -> requests.Session:
    session = cloudscraper.create_scraper(
        browser={"browser": "chrome", "platform": "linux", "mobile": False}
    )
    session.proxies.update(self.proxy())
    return session

def _download_and_save(
    self,
    url: str,
    filepath: Optional[Path] = None,
    var: Optional[Union[str, Iterable[str]]] = None,
) -> IO[bytes]:
    """Download file at url to filepath. Overwrites if filepath exists."""
    headers = {
     "sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"'
     }
    for i in range(5):
        try:
            response = self._session.get(url, stream=True, headers=headers)
            #response = self._session.get(url, header=headers)
            time.sleep(self.rate_limit + random.random() * self.max_delay)
            response.raise_for_status()
            if var is not None:
                if isinstance(var, str):
                    var = [var]
                var_names = "|".join(var)
                template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)"
                pattern_understat = template_understat % bytes(var_names, encoding="utf-8")
                results = re.findall(pattern_understat, response.content)
                data = {
                    key.decode("unicode_escape"): json.loads(value.decode("unicode_escape"))
                    for key, value in results
                }
                payload = json.dumps(data).encode("utf-8")
            else:
                payload = response.content
            if not self.no_store and filepath is not None:
                with filepath.open(mode="wb") as fh:
                    fh.write(payload)
            return io.BytesIO(payload)
        except Exception:
            logger.exception(
                "Error while scraping %s. Retrying... (attempt %d of 5).",
                url,
                i + 1,
            )
            self._session = self._init_session()
            continue

    raise ConnectionError(f"Could not download {url}.")

Now, Fbref is working good!

<!-- gh-comment-id:3218396061 --> @gustavoalikan1910 commented on GitHub (Aug 24, 2025): Hello, I found a fix for the error: I needed to create the header parameter, adding sec-ch-ua in the BaseRequestsReader Class. I gonna let the code here: # ------------------------------------------------------ class BaseRequestsReader(BaseReader): """Base class for readers that use the Python requests module.""" def __init__( self, leagues: Optional[Union[str, list[str]]] = None, proxy: Optional[ Union[str, dict[str, str], list[dict[str, str]], Callable[[], dict[str, str]]] ] = None, no_cache: bool = False, no_store: bool = False, data_dir: Path = DATA_DIR, ): """Initialize the reader.""" super().__init__( no_cache=no_cache, no_store=no_store, leagues=leagues, proxy=proxy, data_dir=data_dir, ) self._session = self._init_session() def _init_session(self) -> requests.Session: session = cloudscraper.create_scraper( browser={"browser": "chrome", "platform": "linux", "mobile": False} ) session.proxies.update(self.proxy()) return session def _download_and_save( self, url: str, filepath: Optional[Path] = None, var: Optional[Union[str, Iterable[str]]] = None, ) -> IO[bytes]: """Download file at url to filepath. Overwrites if filepath exists.""" headers = { "sec-ch-ua": '"Not A Brand";v="99", "Chromium";v="138", "Google Chrome";v="138"' } for i in range(5): try: response = self._session.get(url, stream=True, headers=headers) #response = self._session.get(url, header=headers) time.sleep(self.rate_limit + random.random() * self.max_delay) response.raise_for_status() if var is not None: if isinstance(var, str): var = [var] var_names = "|".join(var) template_understat = rb"(%b)+[\s\t]*=[\s\t]*JSON\.parse\('(.*)'\)" pattern_understat = template_understat % bytes(var_names, encoding="utf-8") results = re.findall(pattern_understat, response.content) data = { key.decode("unicode_escape"): json.loads(value.decode("unicode_escape")) for key, value in results } payload = json.dumps(data).encode("utf-8") else: payload = response.content if not self.no_store and filepath is not None: with filepath.open(mode="wb") as fh: fh.write(payload) return io.BytesIO(payload) except Exception: logger.exception( "Error while scraping %s. Retrying... (attempt %d of 5).", url, i + 1, ) self._session = self._init_session() continue raise ConnectionError(f"Could not download {url}.") Now, Fbref is working good!
Author
Owner

@ozzyman703 commented on GitHub (Aug 25, 2025):

It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me:

import tls_requests

url = "https://fbref.com/en/comps/"

headers = {
# "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7",
# "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6",
# "cache-control": "max-age=0",
# "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT",
"sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"',
# "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36",
}

res = tls_requests.get(url, headers=headers)
assert res.status_code == 200
(what you send as the header's value does not seem to matter)

I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated!

AssertionError Traceback (most recent call last)
/tmp/ipython-input-1561495402.py in <cell line: 0>()
13
14 res = tls_requests.get(url, headers=headers)
---> 15 assert res.status_code == 200

AssertionError:

<!-- gh-comment-id:3220518683 --> @ozzyman703 commented on GitHub (Aug 25, 2025): > It looks like FBref now checks for the Sec-CH-UA request header. The following seems to work for me: > > import tls_requests > > url = "https://fbref.com/en/comps/" > > headers = { > # "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", > # "accept-language": "en-GB,en;q=0.9,nl;q=0.8,nl-BE;q=0.7,en-US;q=0.6", > # "cache-control": "max-age=0", > # "if-modified-since": "Sat, 23 Aug 2025 17:49:23 GMT", > "sec-ch-ua": '"Not)A;Brand";v="8", "Chromium";v="138", "Google Chrome";v="138"', > # "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Mobile Safari/537.36", > } > > res = tls_requests.get(url, headers=headers) > assert res.status_code == 200 > (what you send as the header's value does not seem to matter) I just tried this and got the below error message. Bit of a noob to not sure if I might be doing something wrong - any suggestions much appreciated! AssertionError Traceback (most recent call last) [/tmp/ipython-input-1561495402.py](https://localhost:8080/#) in <cell line: 0>() 13 14 res = tls_requests.get(url, headers=headers) ---> 15 assert res.status_code == 200 AssertionError:
Author
Owner

@mhd0528 commented on GitHub (Oct 4, 2025):

I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?

<!-- gh-comment-id:3367805978 --> @mhd0528 commented on GitHub (Oct 4, 2025): I still get the 403 error even with the header...May I ask what's the time limit for accessing FBref now? Also any other possible solutions?
Author
Owner

@keroloshany47 commented on GitHub (Dec 8, 2025):

i still face the same error for now does any one find a solution

<!-- gh-comment-id:3624975183 --> @keroloshany47 commented on GitHub (Dec 8, 2025): i still face the same error for now does any one find a solution
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#187
No description provided.