[GH-ISSUE #219] [SoFIFA] Scraper gets blocked by bot protection service #47

New issue

Closed

opened 2026-03-02 15:55:19 +03:00 by kerem · 2 comments

kerem commented

2026-03-02 15:55:19 +03:00

Owner

Originally created by @andrzej-konczyk on GitHub (Apr 21, 2023).
Original GitHub issue: https://github.com/probberechts/soccerdata/issues/219

Hi! How can I set up rpxy to get data? I would like to use read_players() function, but there is issue with download data. I am not sure how to set up properly proxy, I assume that can be issue. Current Error is ConnectionError: Could not download https://sofifa.com/.

Originally created by @andrzej-konczyk on GitHub (Apr 21, 2023). Original GitHub issue: https://github.com/probberechts/soccerdata/issues/219 Hi! How can I set up rpxy to get data? I would like to use read_players() function, but there is issue with download data. I am not sure how to set up properly proxy, I assume that can be issue. Current Error is ConnectionError: Could not download https://sofifa.com/.

kerem

2026-03-02 15:55:19 +03:00

closed this issue
added the
bug
label

kerem commented

2026-03-02 15:55:20 +03:00

Author

Owner

@probberechts commented on GitHub (Apr 22, 2023):

It looks like SoFifa has installed stronger protection against scraping through CloudFlare. Setting up a proxy will not help. I do not have a quick solution for this. Probably we will have to keep track of some cookies and add them to the request header or switch to a Selenium-based scraper to bypass the block.

@probberechts commented on GitHub (Apr 22, 2023): It looks like SoFifa has installed stronger protection against scraping through CloudFlare. Setting up a proxy will not help. I do not have a quick solution for this. Probably we will have to keep track of some cookies and add them to the request header or switch to a Selenium-based scraper to bypass the block.

kerem commented

2026-03-02 15:55:20 +03:00

Author

Owner

@probberechts commented on GitHub (Apr 28, 2023):

Based on some limited initial tests, it seems to work with cfscrape.

>>> # Using requests fails
>>> import requests
>>> requests.get("https://sofifa.com/")
<Response [403]>

>>> # Using cfscrape works
>>> import cfscrape
>>> scraper = cfscrape.create_scraper()
>>> scraper.get("https://sofifa.com/")
<Response [200]>

>>> # However, it fails when a session is used
>>> session = requests.Session()
>>> scraper = cfscrape.create_scraper(sess=session)
>>> scraper.get("https://sofifa.com/")
<Response [403]>

@probberechts commented on GitHub (Apr 28, 2023): Based on some limited initial tests, it seems to work with [cfscrape](https://github.com/Anorov/cloudflare-scrape). ```py >>> # Using requests fails >>> import requests >>> requests.get("https://sofifa.com/") <Response [403]> >>> # Using cfscrape works >>> import cfscrape >>> scraper = cfscrape.create_scraper() >>> scraper.get("https://sofifa.com/") <Response [200]> >>> # However, it fails when a session is used >>> session = requests.Session() >>> scraper = cfscrape.create_scraper(sess=session) >>> scraper.get("https://sofifa.com/") <Response [403]> ```

kerem referenced this issue

2026-03-02 15:57:02 +03:00

[PR #47] [CLOSED] Update dependency Sphinx to v5 #243