mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 02:25:51 +03:00
[GH-ISSUE #828] ClubElo: 404 Error Fetching History for Brighton due to URL Construction from Custom Alias #179
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#179
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @makaraduman on GitHub (May 3, 2025).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/828
Affected scrapers
This affects the following scrapers:
Environment:
soccerdata version: [1.8.7]
Python version: [3.9.6]
Operating System: [macOS Sequioia]
Problem Description:
When attempting to fetch the ClubElo team history for "Brighton" using ClubElo().read_team_history('Brighton'), the library attempts to access an incorrect URL (http://api.clubelo.com/Brighton&HoveAlbion..), resulting in a 404 HTTPError. The expected URL would be http://api.clubelo.com/Brighton.
Steps to Reproduce:
Ensure a custom team name replacement file exists at ~/soccerdata/config/teamname_replacements.json (or the path configured via SOCCERDATA_DIR).
Ensure this JSON file contains an entry for "Brighton" with an alias that includes special characters and potentially erroneous trailing characters, specifically "Brighton&HoveAlbion.." in my case:
Run the following Python code:
Expected Result:
The code should successfully fetch the historical data for Brighton using the URL http://api.clubelo.com/Brighton and return a pandas DataFrame, or raise a ValueError only if the correctly formed URL truly returns no data.
Actual Result:
The code fails with a requests.exceptions.HTTPError: 404 Client Error: Not Found because it tries to access the URL http://api.clubelo.com/Brighton&HoveAlbion...
Error message
Analysis:
I traced the execution flow:
The root cause appears to be the combination of potentially erroneous aliases present in the user-provided custom configuration file and the specific processing logic (re.sub) in read_team_history which doesn't sanitize characters like . or & before using the alias to construct the URL.
Suggested Improvement (Optional):
Perhaps the URL construction logic in read_team_history could be made more robust against unexpected characters potentially originating from user configuration files? Alternatively, documenting clearly which characters are safe/unsafe in custom aliases might help users avoid this.
Thank you for looking into this!
Contributor Action Plan