[GH-ISSUE #828] ClubElo: 404 Error Fetching History for Brighton due to URL Construction from Custom Alias #179

Closed
opened 2026-03-02 15:56:26 +03:00 by kerem · 0 comments
Owner

Originally created by @makaraduman on GitHub (May 3, 2025).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/828

Affected scrapers
This affects the following scrapers:

  • ClubElo
  • ESPN
  • FBref
  • FiveThirtyEight
  • FotMob
  • Match History
  • SoFIFA
  • Understat
  • WhoScored

Environment:
soccerdata version: [1.8.7]
Python version: [3.9.6]
Operating System: [macOS Sequioia]

Problem Description:
When attempting to fetch the ClubElo team history for "Brighton" using ClubElo().read_team_history('Brighton'), the library attempts to access an incorrect URL (http://api.clubelo.com/Brighton&HoveAlbion..), resulting in a 404 HTTPError. The expected URL would be http://api.clubelo.com/Brighton.

Steps to Reproduce:
Ensure a custom team name replacement file exists at ~/soccerdata/config/teamname_replacements.json (or the path configured via SOCCERDATA_DIR).
Ensure this JSON file contains an entry for "Brighton" with an alias that includes special characters and potentially erroneous trailing characters, specifically "Brighton&HoveAlbion.." in my case:

{
  "Brighton": [
    "Brighton & Hove Albion",
    // ... other aliases ...
    "Brighton&HoveAlbion",
    "Brighton&HoveAlbion.",
    "Brighton&HoveAlbion.."  // Erroneous alias causing the issue
  ],
  // ... other teams ...
}

Run the following Python code:

import soccerdata as sd
import logging # Optional: To see debug messages if needed

# Optional: Set log level to DEBUG to see more details
# logging.getLogger("root").setLevel(logging.DEBUG)

print("Initializing ClubElo...")
elo = sd.ClubElo()

team_name = 'Brighton'
print(f"Attempting to fetch history for: '{team_name}'")

try:
    history_df = elo.read_team_history(team_name)
    if history_df is not None:
        print(f"Successfully fetched history for {team_name}.")
        print(history_df.head())
    else:
        print(f"Fetched history for {team_name}, but DataFrame is None or empty.")

except Exception as e:
    print(f"\n--- ERROR ---")
    print(f"An error occurred fetching history for '{team_name}':")
    # Print the exception and traceback
    import traceback
    print(f"Exception Type: {type(e).__name__}")
    print(f"Exception Message: {e}")
    print("\nTraceback:")
    traceback.print_exc()
    print("--- END ERROR ---")

Expected Result:
The code should successfully fetch the historical data for Brighton using the URL http://api.clubelo.com/Brighton and return a pandas DataFrame, or raise a ValueError only if the correctly formed URL truly returns no data.

Actual Result:
The code fails with a requests.exceptions.HTTPError: 404 Client Error: Not Found because it tries to access the URL http://api.clubelo.com/Brighton&HoveAlbion...

Error message

[... time ...] ERROR    Error while scraping http://api.clubelo.com/Brighton&HoveAlbion..       _common.py:545
                             Retrying... (attempt 1 of 5).
                             Traceback (most recent call last):
                               File
                             ".../soccerdata/_common.py", line 525, in _download_and_save
                                 response.raise_for_status()
                               File
                             ".../requests/models.py", line 1024, in raise_for_status
                                 raise HTTPError(http_error_msg, response=self)
                             requests.exceptions.HTTPError: 404 Client Error: Not Found for url:
                             http://api.clubelo.com/Brighton&HoveAlbion..

Analysis:
I traced the execution flow:

  1. ClubElo().read_team_history('Brighton') is called.
  2. Inside clubelo.py, add_alt_team_names('Brighton') reads the aliases from my custom teamname_replacements.json file via the TEAMNAME_REPLACEMENTS dictionary loaded in _config.py. This includes the problematic "Brighton&HoveAlbion..".
  3. The code then processes these aliases using teams_to_check = {re.sub(r"[\s']", "", unidecode(team)) for team in teams_to_check}. This processing removes spaces and apostrophes but leaves the & and . characters untouched.
  4. The loop for _team in teams_to_check: iterates through the processed aliases.
  5. When it processes the problematic alias, it constructs the URL as url = f"{CLUB_ELO_API}/{_team}", resulting in http://api.clubelo.com/Brighton&HoveAlbion...
  6. This malformed URL is passed to self.get(), which leads to the 404 error when the request is made in _common.py.

The root cause appears to be the combination of potentially erroneous aliases present in the user-provided custom configuration file and the specific processing logic (re.sub) in read_team_history which doesn't sanitize characters like . or & before using the alias to construct the URL.

Suggested Improvement (Optional):
Perhaps the URL construction logic in read_team_history could be made more robust against unexpected characters potentially originating from user configuration files? Alternatively, documenting clearly which characters are safe/unsafe in custom aliases might help users avoid this.

Thank you for looking into this!

Contributor Action Plan

  • I can fix this issue and will submit a pull request.
  • I’m unsure how to fix this, but I'm willing to work on it with guidance.
  • I’m not able to fix this issue.
Originally created by @makaraduman on GitHub (May 3, 2025). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/828 **Affected scrapers** This affects the following scrapers: - [X] ClubElo - [ ] ESPN - [ ] FBref - [ ] FiveThirtyEight - [ ] FotMob - [ ] Match History - [ ] SoFIFA - [ ] Understat - [ ] WhoScored **Environment**: soccerdata version: [1.8.7] Python version: [3.9.6] Operating System: [macOS Sequioia] **Problem Description:** When attempting to fetch the ClubElo team history for "Brighton" using ClubElo().read_team_history('Brighton'), the library attempts to access an incorrect URL (http://api.clubelo.com/Brighton&HoveAlbion..), resulting in a 404 HTTPError. The expected URL would be http://api.clubelo.com/Brighton. **Steps to Reproduce:** Ensure a custom team name replacement file exists at ~/soccerdata/config/teamname_replacements.json (or the path configured via SOCCERDATA_DIR). Ensure this JSON file contains an entry for "Brighton" with an alias that includes special characters and potentially erroneous trailing characters, specifically "Brighton&HoveAlbion.." in my case: ```python { "Brighton": [ "Brighton & Hove Albion", // ... other aliases ... "Brighton&HoveAlbion", "Brighton&HoveAlbion.", "Brighton&HoveAlbion.." // Erroneous alias causing the issue ], // ... other teams ... } ``` Run the following Python code: ```python import soccerdata as sd import logging # Optional: To see debug messages if needed # Optional: Set log level to DEBUG to see more details # logging.getLogger("root").setLevel(logging.DEBUG) print("Initializing ClubElo...") elo = sd.ClubElo() team_name = 'Brighton' print(f"Attempting to fetch history for: '{team_name}'") try: history_df = elo.read_team_history(team_name) if history_df is not None: print(f"Successfully fetched history for {team_name}.") print(history_df.head()) else: print(f"Fetched history for {team_name}, but DataFrame is None or empty.") except Exception as e: print(f"\n--- ERROR ---") print(f"An error occurred fetching history for '{team_name}':") # Print the exception and traceback import traceback print(f"Exception Type: {type(e).__name__}") print(f"Exception Message: {e}") print("\nTraceback:") traceback.print_exc() print("--- END ERROR ---") ``` **Expected Result:** The code should successfully fetch the historical data for Brighton using the URL http://api.clubelo.com/Brighton and return a pandas DataFrame, or raise a ValueError only if the correctly formed URL truly returns no data. **Actual Result:** The code fails with a requests.exceptions.HTTPError: 404 Client Error: Not Found because it tries to access the URL http://api.clubelo.com/Brighton&HoveAlbion... **Error message** ```python [... time ...] ERROR Error while scraping http://api.clubelo.com/Brighton&HoveAlbion.. _common.py:545 Retrying... (attempt 1 of 5). Traceback (most recent call last): File ".../soccerdata/_common.py", line 525, in _download_and_save response.raise_for_status() File ".../requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://api.clubelo.com/Brighton&HoveAlbion.. ``` **Analysis**: I traced the execution flow: 1. ClubElo().read_team_history('Brighton') is called. 2. Inside clubelo.py, add_alt_team_names('Brighton') reads the aliases from my custom teamname_replacements.json file via the TEAMNAME_REPLACEMENTS dictionary loaded in _config.py. This includes the problematic "Brighton&HoveAlbion..". 3. The code then processes these aliases using teams_to_check = {re.sub(r"[\s']", "", unidecode(team)) for team in teams_to_check}. This processing removes spaces and apostrophes but leaves the & and . characters untouched. 4. The loop for _team in teams_to_check: iterates through the processed aliases. 5. When it processes the problematic alias, it constructs the URL as url = f"{CLUB_ELO_API}/{_team}", resulting in http://api.clubelo.com/Brighton&HoveAlbion... 6. This malformed URL is passed to self.get(), which leads to the 404 error when the request is made in _common.py. The root cause appears to be the combination of potentially erroneous aliases present in the user-provided custom configuration file and the specific processing logic (re.sub) in read_team_history which doesn't sanitize characters like . or & before using the alias to construct the URL. **Suggested Improvement (Optional):** Perhaps the URL construction logic in read_team_history could be made more robust against unexpected characters potentially originating from user configuration files? Alternatively, documenting clearly which characters are safe/unsafe in custom aliases might help users avoid this. Thank you for looking into this! **Contributor Action Plan** - [ ] I can fix this issue and will submit a pull request. - [x] I’m unsure how to fix this, but I'm willing to work on it with guidance. - [ ] I’m not able to fix this issue.
kerem 2026-03-02 15:56:26 +03:00
  • closed this issue
  • added the
    bug
    label
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#179
No description provided.