[GH-ISSUE #219] [SoFIFA] Scraper gets blocked by bot protection service #47

Closed
opened 2026-03-02 15:55:19 +03:00 by kerem · 2 comments
Owner

Originally created by @andrzej-konczyk on GitHub (Apr 21, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/219

Hi! How can I set up rpxy to get data? I would like to use read_players() function, but there is issue with download data. I am not sure how to set up properly proxy, I assume that can be issue. Current Error is ConnectionError: Could not download https://sofifa.com/.

Originally created by @andrzej-konczyk on GitHub (Apr 21, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/219 Hi! How can I set up rpxy to get data? I would like to use read_players() function, but there is issue with download data. I am not sure how to set up properly proxy, I assume that can be issue. Current Error is ConnectionError: Could not download https://sofifa.com/.
kerem 2026-03-02 15:55:19 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@probberechts commented on GitHub (Apr 22, 2023):

It looks like SoFifa has installed stronger protection against scraping through CloudFlare. Setting up a proxy will not help. I do not have a quick solution for this. Probably we will have to keep track of some cookies and add them to the request header or switch to a Selenium-based scraper to bypass the block.

<!-- gh-comment-id:1518578993 --> @probberechts commented on GitHub (Apr 22, 2023): It looks like SoFifa has installed stronger protection against scraping through CloudFlare. Setting up a proxy will not help. I do not have a quick solution for this. Probably we will have to keep track of some cookies and add them to the request header or switch to a Selenium-based scraper to bypass the block.
Author
Owner

@probberechts commented on GitHub (Apr 28, 2023):

Based on some limited initial tests, it seems to work with cfscrape.

>>> # Using requests fails
>>> import requests
>>> requests.get("https://sofifa.com/")
<Response [403]>

>>> # Using cfscrape works
>>> import cfscrape
>>> scraper = cfscrape.create_scraper()
>>> scraper.get("https://sofifa.com/")
<Response [200]>

>>> # However, it fails when a session is used
>>> session = requests.Session()
>>> scraper = cfscrape.create_scraper(sess=session)
>>> scraper.get("https://sofifa.com/")
<Response [403]>
<!-- gh-comment-id:1527246369 --> @probberechts commented on GitHub (Apr 28, 2023): Based on some limited initial tests, it seems to work with [cfscrape](https://github.com/Anorov/cloudflare-scrape). ```py >>> # Using requests fails >>> import requests >>> requests.get("https://sofifa.com/") <Response [403]> >>> # Using cfscrape works >>> import cfscrape >>> scraper = cfscrape.create_scraper() >>> scraper.get("https://sofifa.com/") <Response [200]> >>> # However, it fails when a session is used >>> session = requests.Session() >>> scraper = cfscrape.create_scraper(sess=session) >>> scraper.get("https://sofifa.com/") <Response [403]> ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#47
No description provided.