mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 02:25:51 +03:00
[GH-ISSUE #277] [FBref] Non-data rows in the table body should be removed #57
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#57
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lorenzodb1 on GitHub (Jun 29, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/277
read_player_match_statsthrowsValueError: Length of values (131) does not match length of index (132)in fbref.py#L641 due todf_tablehaving an additional element that shouldn't be there (see line 127 in the attached image).@lorenzodb1 commented on GitHub (Jun 30, 2023):
Issue affects every method that calls
read_schedule. It's quite annoying as it doesn't allow to download any data from many leagues, including major ones such as the UCL.@probberechts commented on GitHub (Jul 7, 2023):
Apart from these header rows, I noticed that FBref also added "spacer" rows to the fixtures table. These can be removed with:
Maybe we should add the following helper method and call it everywhere before passing the HTML to Pandas.
@lorenzodb1 commented on GitHub (Jul 7, 2023):
The spacer rows don't create issues, as when we're scraping the URL, it'll get an empty value for those, which maps well with the empty rows in the table. That being said, I see no problem with the
_clean_tablemethod you suggested. Let me know if you want me to add that in #284.