mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-25 18:15:58 +03:00
[PR #281] [CLOSED] Fixes for issues affecting the FBref scraper #427
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#427
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/probberechts/soccerdata/pull/281
Author: @lorenzodb1
Created: 7/5/2023
Status: ❌ Closed
Base:
master← Head:lorenzodb1-fixes📝 Commits (4)
7494536Moved pretty-error to dev dependency and fixed bug making FBref tests fail383d904Fixed issue affecting cached team season stats6c1e69aFixed issue in read_schedule by moving the Top 5 Leagues optimisation in read_leaguesec15e08Fixed IndexError in _fix_nation_col📊 Changes
5 files changed (+382 additions, -430 deletions)
View changed files
📝
poetry.lock(+335 -409)📝
pyproject.toml(+2 -2)📝
soccerdata/_config.py(+0 -1)📝
soccerdata/fbref.py(+42 -16)📝
tests/test_FBref.py(+3 -2)📄 Description
This PR fixes the following issues:
"all"regardless of the type of stats queried. This caused an issue as the cache might not have contained the table needed. It now caches these tables in different files.nrows, the website adds a row in a table that replicates the table header. This causedread_scheduleto fail as the number of rows indf_tablewould be higher than the one of the list of match URLs obtained (see https://github.com/probberechts/soccerdata/issues/277). I added the logic to remove those replicated headers when found.Scores & Fixtureson theBig 5 European Leagues Statspage. Thus it'd go to the genericScores & Fixturespage, which shows games currently being played. Because of this, I had to move the optimisation that combines the top five leagues under that label inread_leagues, asread_schedulenecessarily needs the five top leagues separately rather than in their combined form.IndexError, supposedly when no flag is present. I fixed this by changing the logic to use regular expressions instead so that when the flag is missing no error is thrown.Additionally, it moves
pretty-errorto the dev dependencies group, as it would otherwise be installed in repositories importing this library (which should not be the case). I'm not sure I've done this correctly, and I had to remove some imports, so please let me know if this breaks previous behaviour and advise me on what I should do instead. It also updatespandasto v2.0.🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.