[PR #420] [MERGED] [WhoScored] Ignore cached events file if empty #542

Closed
opened 2026-03-02 15:58:26 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/probberechts/soccerdata/pull/420
Author: @shufinskiy
Created: 10/26/2023
Status: Merged
Merged: 11/9/2023
Merged by: @probberechts

Base: masterHead: size_cache


📝 Commits (6)

  • c1a8c2a add _size_file method to BaseReader class
  • c43395d Merge branch 'master' of github.com:probberechts/soccerdata into size_cache
  • aeb1229 fix description for flake8 and black
  • c8930f5 Merge branch 'master' of github.com:probberechts/soccerdata into size_cache
  • 3eb9c36 rm _size_file method from BaseReader
  • 7975d00 check that Whoscored reader is not null

📊 Changes

1 file changed (+8 additions, -0 deletions)

View changed files

📝 soccerdata/whoscored.py (+8 -0)

📄 Description

Hello, @probberechts.

I propose a solution to the problem of empty files in the cache for Whoscored.

In issue 98 you suggest delete empty file with bash command by file size.

I made method _size_file which does same with Path.stat().st_size. If the file is smaller than threshold, we believe that it is not cached

    def _size_file(
        self,
        filepath: Optional[Path] = None,
        filter_size: int = 60
    ) -> bool:
        """Check if `filepath` contains data valid size.
        Parameters
        ----------
        filepath : Path, optional
            Path where file should be cached. If None, return False.
        filter_size : int file size threshold. If file is smaller, return False
        Raises
        ------
        TypeError
            If filter_size is not an integer.
        Returns
        -------
        bool
            True in case of a cache hit, otherwise False.
        """
        if filepath is None:
            return False
        if not isinstance(filter_size, int):
            raise TypeError("filter_size must be of type int")
        try:
            file_size = filepath.stat().st_size
        except FileNotFoundError:
            return False
        return file_size > filter_size and filepath.exists()

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/probberechts/soccerdata/pull/420 **Author:** [@shufinskiy](https://github.com/shufinskiy) **Created:** 10/26/2023 **Status:** ✅ Merged **Merged:** 11/9/2023 **Merged by:** [@probberechts](https://github.com/probberechts) **Base:** `master` ← **Head:** `size_cache` --- ### 📝 Commits (6) - [`c1a8c2a`](https://github.com/probberechts/soccerdata/commit/c1a8c2a3855a53e61be6b433dd9ab7b3279ed1db) add _size_file method to BaseReader class - [`c43395d`](https://github.com/probberechts/soccerdata/commit/c43395d9273dd98da5ef6dcffa3bd54f2d9253fb) Merge branch 'master' of github.com:probberechts/soccerdata into size_cache - [`aeb1229`](https://github.com/probberechts/soccerdata/commit/aeb1229ea04dc3046b85adc64be8b1a5575e0d0a) fix description for flake8 and black - [`c8930f5`](https://github.com/probberechts/soccerdata/commit/c8930f59f190922c2ab53ebf7d11e6cfcbfce43f) Merge branch 'master' of github.com:probberechts/soccerdata into size_cache - [`3eb9c36`](https://github.com/probberechts/soccerdata/commit/3eb9c36ea99c7a21b8eb59678202a19cf05bdaa6) rm _size_file method from BaseReader - [`7975d00`](https://github.com/probberechts/soccerdata/commit/7975d00dcfa18503a516ffe4fc4394345c3d0fa5) check that Whoscored reader is not null ### 📊 Changes **1 file changed** (+8 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `soccerdata/whoscored.py` (+8 -0) </details> ### 📄 Description Hello, @probberechts. I propose a solution to the problem of empty files in the cache for Whoscored. In issue [98](https://github.com/probberechts/soccerdata/issues/98) you suggest delete empty file with bash command by file size. I made method `_size_file` which does same with `Path.stat().st_size`. If the file is smaller than threshold, we believe that it is not cached ```python def _size_file( self, filepath: Optional[Path] = None, filter_size: int = 60 ) -> bool: """Check if `filepath` contains data valid size. Parameters ---------- filepath : Path, optional Path where file should be cached. If None, return False. filter_size : int file size threshold. If file is smaller, return False Raises ------ TypeError If filter_size is not an integer. Returns ------- bool True in case of a cache hit, otherwise False. """ if filepath is None: return False if not isinstance(filter_size, int): raise TypeError("filter_size must be of type int") try: file_size = filepath.stat().st_size except FileNotFoundError: return False return file_size > filter_size and filepath.exists() ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 15:58:26 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/soccerdata#542
No description provided.