[GH-ISSUE #104] In-depth tutorial on how to add new leagues? #23

Closed
opened 2026-03-02 15:55:06 +03:00 by kerem · 15 comments
Owner

Originally created by @cj0121 on GitHub (Nov 25, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/104

Hi,

This is an amazing package! I think the docs are mostly very clear. However, is it possible to have a more in-depth tutorial on how to add new leagues to FBRef? I'm trying to add the English Championship, which is available on FB Ref, but wasn't able to. I added a league_dict.json file (with the correct config I assume) to the "SOCCERDATA_DIR/config/" file path, but it seems like the code is not picking up on it when I call fbref = sd.FBref(leagues="EFL Championship", seasons=2019). It gave me a ValueError noting "Invalid League". Thank you so much!

Originally created by @cj0121 on GitHub (Nov 25, 2022). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/104 Hi, This is an amazing package! I think the docs are mostly very clear. However, is it possible to have a more in-depth tutorial on how to add new leagues to FBRef? I'm trying to add the English Championship, which is available on FB Ref, but wasn't able to. I added a `league_dict.json` file (with the correct config I assume) to the `"SOCCERDATA_DIR/config/"` file path, but it seems like the code is not picking up on it when I call `fbref = sd.FBref(leagues="EFL Championship", seasons=2019)`. It gave me a ValueError noting "Invalid League". Thank you so much!
kerem 2026-03-02 15:55:06 +03:00
Author
Owner

@philbywalsh commented on GitHub (Nov 25, 2022):

Hi @cj0121 - try pasting in here the entry which you made in the league_dict.json file. Perhaps there's a syntax error?

<!-- gh-comment-id:1327201432 --> @philbywalsh commented on GitHub (Nov 25, 2022): Hi @cj0121 - try pasting in here the entry which you made in the league_dict.json file. Perhaps there's a syntax error?
Author
Owner

@probberechts commented on GitHub (Nov 25, 2022):

I think it is indeed a good idea to extend the documentation for adding additional leagues. Multiple people seem to be struggling with that.

Now, to resolve your problem, you should:

  1. Make sure to reload the soccerdata module after you modify the league_dict.json file. This file is parsed during the module's import.
  2. Check whether your league_dict.json file is at the correct location. If so, you should see this appear in the log messages.
$python
>>> import soccerdata as sd
[11/25/22 11:49:12] INFO     Custom team name replacements loaded from <path>/teamname_replacements.json.                                                                                                _config.py:83
                    INFO     Custom league dict loaded from <path>/league_dict.json.                                                                                                                    _config.py:153
  1. Check whether it is added to available leagues by running the command below.
>>> sd.FBref.available_leagues()
['Big 5 European Leagues Combined', 'ENG-Premier League', 'ESP-La Liga', 'FRA-Ligue 1', 'GER-Bundesliga', 'INT-World Cup', 'ITA-Serie A']

If that doesn't work, you probably made a mistake in the syntax of your league_dict.json file. Paste it here and we'll try to help you.

<!-- gh-comment-id:1327333924 --> @probberechts commented on GitHub (Nov 25, 2022): I think it is indeed a good idea to extend the documentation for adding additional leagues. Multiple people seem to be struggling with that. Now, to resolve your problem, you should: 1. Make sure to reload the soccerdata module after you modify the `league_dict.json` file. This file is parsed during the module's import. 2. Check whether your `league_dict.json` file is at the correct location. If so, you should see this appear in the log messages. ```sh $python >>> import soccerdata as sd [11/25/22 11:49:12] INFO Custom team name replacements loaded from <path>/teamname_replacements.json. _config.py:83 INFO Custom league dict loaded from <path>/league_dict.json. _config.py:153 ``` 3. Check whether it is added to available leagues by running the command below. ``` >>> sd.FBref.available_leagues() ['Big 5 European Leagues Combined', 'ENG-Premier League', 'ESP-La Liga', 'FRA-Ligue 1', 'GER-Bundesliga', 'INT-World Cup', 'ITA-Serie A'] ``` If that doesn't work, you probably made a mistake in the syntax of your `league_dict.json` file. Paste it here and we'll try to help you.
Author
Owner

@cj0121 commented on GitHub (Nov 26, 2022):

Thanks for getting back! Bellow is my league_dict.json file:

{
    "ENG-Premier League": {
        "ClubElo": "ENG_1",
        "MatchHistory": "E0",
        "FiveThirtyEight": "premier-league",
        "FBref": "Premier League",
        "ESPN": "eng.1",
        "SoFIFA": "English Premier League (1)",
        "WhoScored": "England - Premier League",
        "season_start": "Aug",
        "season_end": "May",
    },
    "ESP-La Liga": {
        "ClubElo": "ESP_1",
        "MatchHistory": "SP1",
        "FiveThirtyEight": "la-liga",
        "FBref": "La Liga",
        "ESPN": "esp.1",
        "SoFIFA": "Spain Primera Division (1)",
        "WhoScored": "Spain - LaLiga",
        "season_start": "Aug",
        "season_end": "May",
    },
    "ITA-Serie A": {
        "ClubElo": "ITA_1",
        "MatchHistory": "I1",
        "FiveThirtyEight": "serie-a",
        "FBref": "Serie A",
        "ESPN": "ita.1",
        "SoFIFA": " Italian Serie A (1)",
        "WhoScored": "Italy - Serie A",
        "season_start": "Aug",
        "season_end": "May",
    },
    "GER-Bundesliga": {
        "ClubElo": "GER_1",
        "MatchHistory": "D1",
        "FiveThirtyEight": "bundesliga",
        "FBref": "Fußball-Bundesliga",
        "ESPN": "ger.1",
        "SoFIFA": "German 1. Bundesliga (1)",
        "WhoScored": "Germany - Bundesliga",
        "season_start": "Aug",
        "season_end": "May",
    },
    "FRA-Ligue 1": {
        "ClubElo": "FRA_1",
        "MatchHistory": "F1",
        "FiveThirtyEight": "ligue-1",
        "FBref": "Ligue 1",
        "ESPN": "fra.1",
        "SoFIFA": "French Ligue 1 (1)",
        "WhoScored": "France - Ligue 1",
        "season_start": "Aug",
        "season_end": "May",
    },
    "EFL Championship": {
        "FBref": "EFL Championship",
        "season_start": "Aug",
        "season_end": "May",
    },
    "NED-Eredivisie": {
        "ClubElo": "NED_1",
        "MatchHistory": "N1",
        "SoFIFA": "Holland Eredivisie (1)",
        "FBref": "Dutch Eredivisie",
        "ESPN": "ned.1",
        "FiveThirtyEight": "eredivisie",
        "WhoScored": "Netherlands - Eredivisie",
        "season_start": "Aug",
        "season_end": "May",
    },
}

As you can see I added EFL Championship and NED-Eredivisie. The NED-Eredivisie is a straight copy from the docs. Additional question: for each additional league, is it required to include all five data sources as properties. If yes, the values of those need to be matched by the whatever ID used on the original sites, correct?

Currently I have the json file here on my mac: User/soccerdata/config/league_dict.json. I think the file location might be the problem. I wasn't able to locate SOCCERDATA_DIR suggested in the docs.

Much appreciated!

<!-- gh-comment-id:1328073118 --> @cj0121 commented on GitHub (Nov 26, 2022): Thanks for getting back! Bellow is my `league_dict.json` file: ``` { "ENG-Premier League": { "ClubElo": "ENG_1", "MatchHistory": "E0", "FiveThirtyEight": "premier-league", "FBref": "Premier League", "ESPN": "eng.1", "SoFIFA": "English Premier League (1)", "WhoScored": "England - Premier League", "season_start": "Aug", "season_end": "May", }, "ESP-La Liga": { "ClubElo": "ESP_1", "MatchHistory": "SP1", "FiveThirtyEight": "la-liga", "FBref": "La Liga", "ESPN": "esp.1", "SoFIFA": "Spain Primera Division (1)", "WhoScored": "Spain - LaLiga", "season_start": "Aug", "season_end": "May", }, "ITA-Serie A": { "ClubElo": "ITA_1", "MatchHistory": "I1", "FiveThirtyEight": "serie-a", "FBref": "Serie A", "ESPN": "ita.1", "SoFIFA": " Italian Serie A (1)", "WhoScored": "Italy - Serie A", "season_start": "Aug", "season_end": "May", }, "GER-Bundesliga": { "ClubElo": "GER_1", "MatchHistory": "D1", "FiveThirtyEight": "bundesliga", "FBref": "Fußball-Bundesliga", "ESPN": "ger.1", "SoFIFA": "German 1. Bundesliga (1)", "WhoScored": "Germany - Bundesliga", "season_start": "Aug", "season_end": "May", }, "FRA-Ligue 1": { "ClubElo": "FRA_1", "MatchHistory": "F1", "FiveThirtyEight": "ligue-1", "FBref": "Ligue 1", "ESPN": "fra.1", "SoFIFA": "French Ligue 1 (1)", "WhoScored": "France - Ligue 1", "season_start": "Aug", "season_end": "May", }, "EFL Championship": { "FBref": "EFL Championship", "season_start": "Aug", "season_end": "May", }, "NED-Eredivisie": { "ClubElo": "NED_1", "MatchHistory": "N1", "SoFIFA": "Holland Eredivisie (1)", "FBref": "Dutch Eredivisie", "ESPN": "ned.1", "FiveThirtyEight": "eredivisie", "WhoScored": "Netherlands - Eredivisie", "season_start": "Aug", "season_end": "May", }, } ``` As you can see I added `EFL Championship` and `NED-Eredivisie`. The `NED-Eredivisie` is a straight copy from the docs. Additional question: for each additional league, is it required to include all five data sources as properties. If yes, the values of those need to be matched by the whatever ID used on the original sites, correct? Currently I have the json file here on my mac: `User/soccerdata/config/league_dict.json`. I think the file location might be the problem. I wasn't able to locate `SOCCERDATA_DIR` suggested in the docs. Much appreciated!
Author
Owner

@probberechts commented on GitHub (Nov 26, 2022):

No, you do not have to include all five data sources, nor the "season_start" and "season_end" fields.

There is one error in your json file: you should remove the comma at the end of the second to last line to have a valid json file.

You can see where it looks for the json file in the log messages that are printed when importing the library.

$python
>>> import soccerdata as sd
[11/25/22 11:49:12] INFO     Custom team name replacements loaded from <path>/teamname_replacements.json.                                                                                                _config.py:83
                    INFO     Custom league dict loaded from <path>/league_dict.json.                                                                                                                    _config.py:153
<!-- gh-comment-id:1328073826 --> @probberechts commented on GitHub (Nov 26, 2022): No, you do not have to include all five data sources, nor the "season_start" and "season_end" fields. There is one error in your json file: you should remove the comma at the end of the second to last line to have a valid json file. You can see where it looks for the json file in the log messages that are printed when importing the library. ```sh $python >>> import soccerdata as sd [11/25/22 11:49:12] INFO Custom team name replacements loaded from <path>/teamname_replacements.json. _config.py:83 INFO Custom league dict loaded from <path>/league_dict.json. _config.py:153 ```
Author
Owner

@cj0121 commented on GitHub (Nov 27, 2022):

I got it to work finally! Thanks so much for the help! Turned out to be much easier than I thought. Just needed to make sure the league_dict.json is of correct syntax and at the right place.

<!-- gh-comment-id:1328175982 --> @cj0121 commented on GitHub (Nov 27, 2022): I got it to work finally! Thanks so much for the help! Turned out to be much easier than I thought. Just needed to make sure the `league_dict.json` is of correct syntax and at the right place.
Author
Owner

@andrzej-konczyk commented on GitHub (Apr 19, 2023):

Hi, I have simillar case but only now with Eredivisie. I've created league_dict.json which includes :
{ "NED-Eredivisie": { "ClubElo": "NED_1", "MatchHistory": "N1", "SoFIFA": "Holland Eredivisie (1)", "FBref": "Dutch Eredivisie", "ESPN": "ned.1", "FiveThirtyEight": "eredivisie", "WhoScored": "Netherlands - Eredivisie", "season_start": "Aug", "season_end": "May" } } and after run _config.py I see comment that league is added, but when I run sd.FBref.available_leagues() then I do not have that new league there - I do not knoiw why

<!-- gh-comment-id:1515491178 --> @andrzej-konczyk commented on GitHub (Apr 19, 2023): Hi, I have simillar case but only now with Eredivisie. I've created league_dict.json which includes : `{ "NED-Eredivisie": { "ClubElo": "NED_1", "MatchHistory": "N1", "SoFIFA": "Holland Eredivisie (1)", "FBref": "Dutch Eredivisie", "ESPN": "ned.1", "FiveThirtyEight": "eredivisie", "WhoScored": "Netherlands - Eredivisie", "season_start": "Aug", "season_end": "May" } }` and after run _config.py I see comment that league is added, but when I run ` sd.FBref.available_leagues()` then I do not have that new league there - I do not knoiw why
Author
Owner

@probberechts commented on GitHub (Apr 20, 2023):

@andrzej-konczyk Your json seems correct. I do not really get what you mean by "after run _config.py I see comment that league is added" though. The file _config.py is not an executable.

One hint I can think of: make sure to reload all imported soccerdata modules after modifying the league_dict.json file. The most straightforward way to do this is to restart your notebook or python interpreter.

<!-- gh-comment-id:1516826891 --> @probberechts commented on GitHub (Apr 20, 2023): @andrzej-konczyk Your json seems correct. I do not really get what you mean by "after run _config.py I see comment that league is added" though. The file `_config.py` is not an executable. One hint I can think of: make sure to reload all imported soccerdata modules after modifying the `league_dict.json` file. The most straightforward way to do this is to restart your notebook or python interpreter.
Author
Owner

@andrzej-konczyk commented on GitHub (Apr 21, 2023):

Yeah , restart helped. Thanks!

<!-- gh-comment-id:1517578205 --> @andrzej-konczyk commented on GitHub (Apr 21, 2023): Yeah , restart helped. Thanks!
Author
Owner

@Lushin415 commented on GitHub (May 15, 2023):

And where can I see the correct names for the leagues? Let's say where you could see the name of the Dutch league on the ESPN website?
"ESPN": "ned.1","

<!-- gh-comment-id:1547138840 --> @Lushin415 commented on GitHub (May 15, 2023): And where can I see the correct names for the leagues? Let's say where you could see the name of the Dutch league on the ESPN website? `"ESPN": "ned.1","`
Author
Owner

@lorenzodb1 commented on GitHub (Jun 29, 2023):

It appears that the docs are wrong regarding how to add additional leagues for FBref. For instance, it suggests adding to league_dict.json

{
  "NED-Eredivisie": {
    "FBref": "Dutch Eredivisie"
  }
}

when one should actually add

{
  "NED-Eredivisie": {
    "FBref": "Eredivisie"
  }
}

for it to actually work. Not sure if the name used in FBref changed after the example was written, but I just thought of pointing it out as it's quite confusing.

<!-- gh-comment-id:1612489567 --> @lorenzodb1 commented on GitHub (Jun 29, 2023): It appears that the docs are wrong regarding how to add additional leagues for FBref. For instance, it suggests adding to `league_dict.json` ``` { "NED-Eredivisie": { "FBref": "Dutch Eredivisie" } } ``` when one should actually add ``` { "NED-Eredivisie": { "FBref": "Eredivisie" } } ``` for it to actually work. Not sure if the name used in FBref changed after the example was written, but I just thought of pointing it out as it's quite confusing.
Author
Owner

@WillT23 commented on GitHub (Jul 20, 2023):

Hi, firstly thanks for creating this, I've found it so useful.

I'm having some trouble trying to customise the code to include the Women's World Cup. I've followed the same process of adding other leagues which I've been successful with, but i'm getting the following error. I've also pasted the relevant part of my league_dict json.

KeyError: "None of [Index(['WWC-WWC'], dtype='object', name='league')] are in the [index]"

"WWC-WWC": {
"WhoScored": "International - FIFA Women s World Cup",
"season_start": "Jul",
"season_end": "Sep"
}
Thanks,

Will

<!-- gh-comment-id:1643670352 --> @WillT23 commented on GitHub (Jul 20, 2023): Hi, firstly thanks for creating this, I've found it so useful. I'm having some trouble trying to customise the code to include the Women's World Cup. I've followed the same process of adding other leagues which I've been successful with, but i'm getting the following error. I've also pasted the relevant part of my league_dict json. KeyError: "None of [Index(['WWC-WWC'], dtype='object', name='league')] are in the [index]" "WWC-WWC": { "WhoScored": "International - FIFA Women s World Cup", "season_start": "Jul", "season_end": "Sep" } Thanks, Will
Author
Owner

@probberechts commented on GitHub (Jul 20, 2023):

@WillT23 I believe it should be "International - FIFA Women's World Cup". You forgot the apostrophe.

<!-- gh-comment-id:1643729363 --> @probberechts commented on GitHub (Jul 20, 2023): @WillT23 I believe it should be "International - FIFA Women's World Cup". You forgot the apostrophe.
Author
Owner

@WillT23 commented on GitHub (Jul 20, 2023):

Thanks for the reply, although unfortunately I'm still having the same issue when the apostrophe is in the right place.

<!-- gh-comment-id:1643870511 --> @WillT23 commented on GitHub (Jul 20, 2023): Thanks for the reply, although unfortunately I'm still having the same issue when the apostrophe is in the right place.
Author
Owner

@probberechts commented on GitHub (Jul 20, 2023):

@WillT23 It seems to work fine. See #299. Make sure to reload the soccerdata module after you modify the league_dict.json file and disable caching after adding a new league.

<!-- gh-comment-id:1644452376 --> @probberechts commented on GitHub (Jul 20, 2023): @WillT23 It seems to work fine. See #299. Make sure to reload the soccerdata module after you modify the `league_dict.json` file and disable caching after adding a new league.
Author
Owner

@WillT23 commented on GitHub (Jul 20, 2023):

Perfect, works now. Thanks so much for your help!

<!-- gh-comment-id:1644589566 --> @WillT23 commented on GitHub (Jul 20, 2023): Perfect, works now. Thanks so much for your help!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#23
No description provided.