[PR #1467] [MERGED] Fixed flushing dirty data and compressed the cache size #1997

Closed
opened 2026-03-04 02:03:11 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/s3fs-fuse/s3fs-fuse/pull/1467
Author: @ggtakec
Created: 11/3/2020
Status: Merged
Merged: 11/14/2020
Merged by: @gaul

Base: masterHead: holes_for_nodata_area


📝 Commits (1)

  • d329155 Fixed flushing dirty data and compressed the cache size

📊 Changes

6 files changed (+115 additions, -3 deletions)

View changed files

📝 configure.ac (+1 -0)
📝 src/fdcache_entity.cpp (+65 -0)
📝 src/fdcache_entity.h (+1 -0)
📝 src/fdcache_page.cpp (+39 -0)
📝 src/fdcache_page.h (+1 -0)
📝 src/s3fs.cpp (+8 -3)

📄 Description

Relevant Issue (if applicable)

#1448

Details

Regarding the process when the max_dirty_data option is specified(the process of uploading while writing a file), the following two have been modified.

Fixed a bug

Changed the buffer that stores the return value when uploading(Flush) during the processing of s3fs_write.
The return value of the s3fs_write function must be the number of bytes written, so it returned the wrong value(error code).
Due to this, there were cases where uploading could not be performed normally.

Cache file compression

In the process by max_dirty_data option, when uploading while writing the file, the contents of the cache file remained as it was.
If it remains, it will take up disk space.
After uploading, new code punches a HOLE in the cache file so that it clears blocks on the hard disk for cached data.
This will minimize disk space pressure when uploading with max_dirty_data option.

Cache file state when max_dirty_data option is not specified
  • Cache file status(if uploading 66000044 bytes)
644532 -rw------- 1 guest users 660000044 Nov  3 05:20 big.txt
  • Cache file stat information (.<Bucketname>/big.txt)
1351054:660000044
0:660000044:1:0
Cache file state when max_dirty_data option is specified
  • Cache file status(if uploading 66000044 bytes)
25012 -rw------- 1 guest users 660000044 Nov  3 05:20 big.txt
  • Cache file stat information (.<Bucketname>/big.txt)
1351054:660000044
0:634388480:0:0
634388480:25611564:1:0

In this way, the disk space is not used except for the last uploaded area.

Notes

The fallocate() function is used for this process, and this function is a non-portable Linux-specific system call.
Therefore, this function does not exist except on Linux base(ex: OSX).
To avoid this, I check the fallocate function in configure and implement a dummy function if it does not exist.
The dummy function is always failing and does not compress the cache file.
This is a limitation on OSX etc.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/s3fs-fuse/s3fs-fuse/pull/1467 **Author:** [@ggtakec](https://github.com/ggtakec) **Created:** 11/3/2020 **Status:** ✅ Merged **Merged:** 11/14/2020 **Merged by:** [@gaul](https://github.com/gaul) **Base:** `master` ← **Head:** `holes_for_nodata_area` --- ### 📝 Commits (1) - [`d329155`](https://github.com/s3fs-fuse/s3fs-fuse/commit/d329155d426fa1632520c4a1d2c52463fbb7b017) Fixed flushing dirty data and compressed the cache size ### 📊 Changes **6 files changed** (+115 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `configure.ac` (+1 -0) 📝 `src/fdcache_entity.cpp` (+65 -0) 📝 `src/fdcache_entity.h` (+1 -0) 📝 `src/fdcache_page.cpp` (+39 -0) 📝 `src/fdcache_page.h` (+1 -0) 📝 `src/s3fs.cpp` (+8 -3) </details> ### 📄 Description ### Relevant Issue (if applicable) #1448 ### Details Regarding the process when the max_dirty_data option is specified(the process of uploading while writing a file), the following two have been modified. #### Fixed a bug Changed the buffer that stores the return value when uploading(Flush) during the processing of s3fs_write. The return value of the s3fs_write function must be the number of bytes written, so it returned the wrong value(error code). Due to this, there were cases where uploading could not be performed normally. #### Cache file compression In the process by max_dirty_data option, when uploading while writing the file, the contents of the cache file remained as it was. If it remains, it will take up disk space. After uploading, new code punches a HOLE in the cache file so that it clears blocks on the hard disk for cached data. This will minimize disk space pressure when uploading with max_dirty_data option. ##### Cache file state when max_dirty_data option is **not specified** - Cache file status(if uploading 66000044 bytes) ``` 644532 -rw------- 1 guest users 660000044 Nov 3 05:20 big.txt ``` - Cache file stat information (`.<Bucketname>/big.txt`) ``` 1351054:660000044 0:660000044:1:0 ``` ##### Cache file state when max_dirty_data option is **specified** - Cache file status(if uploading 66000044 bytes) ``` 25012 -rw------- 1 guest users 660000044 Nov 3 05:20 big.txt ``` - Cache file stat information (`.<Bucketname>/big.txt`) ``` 1351054:660000044 0:634388480:0:0 634388480:25611564:1:0 ``` In this way, the disk space is not used except for the last uploaded area. ##### Notes The fallocate() function is used for this process, and this function is a non-portable Linux-specific system call. Therefore, this function does not exist except on Linux base(ex: OSX). To avoid this, I check the fallocate function in configure and implement a dummy function if it does not exist. The dummy function is always failing and does not compress the cache file. This is a limitation on OSX etc. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-04 02:03:11 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1997
No description provided.