mirror of
https://github.com/probberechts/soccerdata.git
synced 2026-04-25 18:15:58 +03:00
[GH-ISSUE #126] [WhoScored] Date format problem #27
Labels
No labels
ESPN
FBref
FotMob
MatchHistory
SoFIFA
Sofascore
WhoScored
WhoScored
bug
build
common
dependencies
discussion
documentation
duplicate
enhancement
good first issue
invalid
performance
pull-request
question
question
removal
understat
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/soccerdata#27
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @CBatatinha on GitHub (Dec 19, 2022).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/126
Hello,
I'm trying to pull the schedule from any league, but it keeps getting an error in the date format. Even when I input the match ID, keeps with problem to read the data because of the date format. How can I solve it?
ValueError:
time data 'Jumatatu, Des 26 2022 12:30' does not match format '%A, %b %d %Y %H:%M'
@probberechts commented on GitHub (Dec 19, 2022):
"Jumatatu" is apparently Swahili for "Monday". Swahili isn't even a supported language on WhoScored, so it is probably a bug in the website (which will resolve itself automatically) or a plugin in your browser which translates the dates automatically.
Which league / season are you trying to scrape?
@CBatatinha commented on GitHub (Dec 19, 2022):
Premier League 2022
@LuccaStochiero commented on GitHub (Feb 13, 2023):
Hello,
I've got the same problem as well. Trying to pull any match from any league, the page automatically translate to swahili and the data format doesn't match. I even turned on my VPN to see if the problem is here in Brazil but nothing really change
@probberechts commented on GitHub (Feb 13, 2023):
I have no issues on the main domain, but experience the same problem on the 1xbet subdomain. For example on https://1xbet.whoscored.com/Regions/252/Tournaments/2/England-Premier-League. It seems that WhoScored uses Swahili as the default locale, but I haven't managed to figure out how to force WhoScored to set the English locale.
One workaround I see is to create a fallback function that attempts to parse dates as Swahilian if parsing as an English date fails. One thing to keep in mind here is that most people will not have the Swahili ("sw_KE") locale on their system, so I think it is best to just create a dict with days of the week and months to create the mapping. If someone would like to implement this, please go ahead.
@LuccaStochiero commented on GitHub (Feb 14, 2023):
Sorry for bother you again but i'm really a newbie in Python, more accustomed to R, do u know any place i can find a tutorial to make that dict?
@probberechts commented on GitHub (Feb 14, 2023):
I'll see if I can implement this during the weekend. Currently not sure how to do it best either. I do not have experience with parsing non-English dates.
@guilherme-95 commented on GitHub (Mar 1, 2023):
One possible workaround is routing traffic through a country in which 1xbet is not allowed to operate, as that will keep you within the main domain
@probberechts commented on GitHub (Mar 1, 2023):
Ah, interesting. Such as Belgium apparently 😃 I can browse directly to 1xbet.whoscored.com, but did not know that it gets redirected in other countries.
Anyway, I think the fix that I implemented in
github.com/probberechts/soccerdata@a3bf31b977is more straightforward. I only re-opened this issue because it looks like I made a small mistake (see https://github.com/ML-KULeuven/socceraction/issues/474).