mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-25 10:05:53 +03:00
[GH-ISSUE #880] FBRef Team Match Stats doesn't scrape league matches anymore for La Liga, Seria A, EPL, Ligue 1, but works for Bundesliga #189
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#189
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aegonwolf on GitHub (Aug 24, 2025).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/880
Describe the bug
The page structure changed for the other leagues, I was able to clone and fix the repo locally, however, fixing those leagues broke Bundesliga....
Affected scrapers
This affects the following scrapers:
Code example
fbref = sd.FBref('ESP-La Liga', '2024')or Top5 or EPL it's the same. Bundesliga ok!Error message
Additional context
Add any other context about the problem here.
Contributor Action Plan
@gustavoalikan1910 commented on GitHub (Aug 24, 2025):
With this change it worked for me:
`
class BaseRequestsReader(BaseReader):
"""Base class for readers that use the Python requests module."""
`
@aegonwolf commented on GitHub (Aug 25, 2025):
That is interesting @gustavoalikan1910 Can you examine that you actually get league match results for team_match_stats? I made a few similar changes and in the end I found that the tables just changed and also in my browser. So I updated the pattern, which however does not work for Bundesliga. I.e. I never got blocked, I just didn't scrape the right data for league games from some leagues. so I changed fbref.py:
`"""Scraper for http://fbref.com."""
import warnings
from datetime import datetime, timezone
from functools import reduce
from pathlib import Path
from typing import Callable, Optional, Union
import pandas as pd
from lxml import etree, html
from ._common import (
BaseRequestsReader,
SeasonCode,
add_alt_team_names,
make_game_id,
standardize_colnames,
)
from ._config import DATA_DIR, NOCACHE, NOSTORE, TEAMNAME_REPLACEMENTS, logger
FBREF_DATADIR = DATA_DIR / "FBref"
FBREF_API = "https://fbref.com"
BIG_FIVE_DICT = {
"Serie A": "ITA-Serie A",
"Ligue 1": "FRA-Ligue 1",
"La Liga": "ESP-La Liga",
"Premier League": "ENG-Premier League",
"Bundesliga": "GER-Bundesliga",
}
class FBref(BaseRequestsReader):
"""Provides pd.DataFrames from data at http://fbref.com.
def _parse_table(html_table: html.HtmlElement) -> pd.DataFrame:
"""Parse HTML table into a dataframe.
def _concat(dfs: list[pd.DataFrame], key: list[str]) -> pd.DataFrame:
"""Merge matching tables scraped from different pages.
def _fix_nation_col(df_table: pd.DataFrame) -> pd.DataFrame:
"""Fix the "Nation" column.
`
@gustavoalikan1910 commented on GitHub (Aug 28, 2025):
@aegonwolf
Yes, it's working for me when I try to capture match data... Team and players, working good....taking a long time, but it's working...
I gonna let some examples files that I got here:
PLAYER MATCH STATS KEEPER
brasileirao_player_match_stats_keepers_2025.json
TEAM MATCH STATS KEEPER
brasileirao_team_match_stats_keeper_2025.json
@gustavoalikan1910 commented on GitHub (Aug 28, 2025):
@aegonwolf
If you want to scrape another Leagues, you need to configure the league_dict.json file...
Example:
Then, you need to set the fbref class with the league you want:
fbref = sd.FBref(leagues="brasileirao", seasons=2025, proxy='tor')
And then call the stats:
df = fbref.read_player_match_stats(stat_type='keeper', match_id='123456', force_cache=False)
@aegonwolf commented on GitHub (Sep 5, 2025):
Ah @gustavoalikan1910 have you inspected the data you scrape? For EPL, Ligue 1, Serie A etc you only scrape cups and international matches, the table selectors have changed. I'm fairly certain because that's what I changed and it works. It isn't for all leagues only these 4 it works as expected for all others still