mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-26 02:25:51 +03:00
[GH-ISSUE #619] 'read_events' Function Ignoring 'live=False' Parameter and Issues with Group Stage vs Knockout Stage HTML Structure #118
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#118
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ds-oliver on GitHub (Jun 27, 2024).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/619
While using the
soccerdatalibrary to scrape event data from WhoScored, I've encountered an issue where theread_eventsfunction seems to ignore thelive=Falseparameter. Despite explicitly settinglive=False, the function attempts to scrape the live URL, resulting in repeated errors. (Please ignore the "priority game" aspects of the script that is carry over from another project that I did not remove from this function call.)Here are some relevant details:
Script Parameters and Logs:
live=Falsefor theread_eventsfunction.https://www.whoscored.com/Matches/1787316/Live.HTML Structure for Group Stage vs Knockouts:
Steps to Reproduce:
soccerdatalibrary.read_eventsfunction haslive=False.Expected Behavior:
The
read_eventsfunction should not attempt to access the live URL whenlive=Falseis set.Actual Behavior:
The function tries to scrape the live URL, leading to repeated errors.
Logs:
Code:
Additional Context:
The HTML structure for group stage games versus knockout stage games might be contributing to the issue. The difference in structure could potentially impact the scraping process.
Environment:
soccerdataversion: [please specify]Potential Fix:
Please investigate why the
live=Falseparameter is not being respected by theread_eventsfunction. Additionally, consider any differences in HTML structure between group stage and knockout stage games that might affect scraping.Thank you for your attention to this issue. Let me know if you need any additional information.
@ds-oliver commented on GitHub (Jul 3, 2024):
@probberechts any help here?
@probberechts commented on GitHub (Jul 3, 2024):
First, you may misunderstand the purpose of the
liveparameter. Settinglive=Falsedoesn't really do anything. It corresponds to the default behaviour where the events will be scraped if they are not in the cache. Settinglive=Trueallows disabling the cache which is mainly useful when retrieving event data during the game (because you want to ignore what is in the cache).Otherwise, I do not see what the issue could be. Basically, it reduces to this block of code where
no_cachetakes the value oflive:github.com/probberechts/soccerdata@64a4fa0261/soccerdata/_common.py (L304-L306)So, if you set
live=False, it'swhich implies that it will scrape the data only if you've set
no_cache=Truein the constructor or if the game has been cached before.@ds-oliver commented on GitHub (Jul 4, 2024):
Thank you for looking.
Well, whatever the issue was, it appears to have been patched in the most recent version. Updated the package and my script works fine now!