[GH-ISSUE #2020] Local disk usage spikes while using s3 via s3fs #1014

Open
opened 2026-03-04 01:50:38 +03:00 by kerem · 3 comments
Owner

Originally created by @infrasmeworld on GitHub (Aug 10, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2020

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Version : 2.9.7
Release : 15.el8

Kernel information (uname -r)

4.18.0-147.5.1.el8_1.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

Red Hat Enterprise Linux release 8.1 (Ootpa)

s3fs command line used, if applicable

NA

/etc/fstab entry, if applicable

s3fs#<bucketname> <Mount pointt> fuse _netdev,umask=0022,allow_other,iam_role=auto,dbglevel=debug 0 0

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

NA

Details about issue

We have mounted S3 bucket using s3fs in our EC2 instances. This bucket is used as a staging area for data pipelines we are running.

We are downloading zip files to this staging area, unzipping it before proceeding with the next action, and performing additional read/write operations against these download files in S3.

We see a spike in local FS usage while executing the above pipeline. Even though we mounted the S3 using s3fs and performed operations against S3 files, will it still be downloaded to the local file system? The disk spikes are temporary while performing operations against S3 objects only. Any thoughts on this behavior? We did multiple tests like tracking system calls to see if it is writing to the local FS but couldn't find any leads. I understand that the S3 is an object storage solution and it is not possible to edit the objects on the fly and save them. If there is a download requirement present for modifying objects, what is the default location of such temporary storage?

+++++++++++++
s3fs[1414171]: s3fs.cpp:check_parent_object_access(627): [path=/.hrishi_tes.swpx]
s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/]
s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0]
s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/]
s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/.hrishi_tes.swpx]
s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0]
s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/.hrishi_tes.swpx]
s3fs[1414171]: cache.cpp:GetStat(266): stat cache hit [path=/.hrishi_tes.swpx][time=1213909.366475541][hit count=1]
s3fs[1414171]: fdcache.cpp:GetExistFdEntity(629): [path=/.hrishi_tes.swpx][pseudo_fd=4]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: fdcache_entity.cpp:Load(1015): [path=/.hrishi_tes.swpx][physical_fd=26][offset=0][size=0]
s3fs[1414171]: [tpath=/.hrishi_tes.swpx]
s3fs[1414171]: [tpath=/.hrishi_tes.swpx]
s3fs[1414171]: curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30
s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx
s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx
s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0]
s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0]
s3fs[1414171]: curl.cpp:RequestPerform(2289): connecting to URL https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]
s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]
s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com
s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com
s3fs[1414171]: HTTP response code 200
s3fs[1414171]: HTTP response code 200
s3fs[1414171]: curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool
s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx]
s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx]
s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4]
s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4]

Originally created by @infrasmeworld on GitHub (Aug 10, 2022). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2020 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) Version : 2.9.7 Release : 15.el8 #### Kernel information (uname -r) 4.18.0-147.5.1.el8_1.x86_64 #### GNU/Linux Distribution, if applicable (cat /etc/os-release) Red Hat Enterprise Linux release 8.1 (Ootpa) #### s3fs command line used, if applicable ``` NA ``` #### /etc/fstab entry, if applicable ``` s3fs#<bucketname> <Mount pointt> fuse _netdev,umask=0022,allow_other,iam_role=auto,dbglevel=debug 0 0 ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) ``` NA ``` ### Details about issue We have mounted S3 bucket using s3fs in our EC2 instances. This bucket is used as a staging area for data pipelines we are running. We are downloading zip files to this staging area, unzipping it before proceeding with the next action, and performing additional read/write operations against these download files in S3. We see a spike in local FS usage while executing the above pipeline. Even though we mounted the S3 using s3fs and performed operations against S3 files, will it still be downloaded to the local file system? The disk spikes are temporary while performing operations against S3 objects only. Any thoughts on this behavior? We did multiple tests like tracking system calls to see if it is writing to the local FS but couldn't find any leads. I understand that the S3 is an object storage solution and it is not possible to edit the objects on the fly and save them. If there is a download requirement present for modifying objects, what is the default location of such temporary storage? +++++++++++++ s3fs[1414171]: s3fs.cpp:check_parent_object_access(627): [path=/.hrishi_tes.swpx] s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/] s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0] s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/] s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/.hrishi_tes.swpx] s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0] s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/.hrishi_tes.swpx] s3fs[1414171]: cache.cpp:GetStat(266): stat cache hit [path=/.hrishi_tes.swpx][time=1213909.366475541][hit count=1] s3fs[1414171]: fdcache.cpp:GetExistFdEntity(629): [path=/.hrishi_tes.swpx][pseudo_fd=4] s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26] s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26] s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26] s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26] s3fs[1414171]: fdcache_entity.cpp:Load(1015): [path=/.hrishi_tes.swpx][physical_fd=26][offset=0][size=0] s3fs[1414171]: [tpath=/.hrishi_tes.swpx] s3fs[1414171]: [tpath=/.hrishi_tes.swpx] s3fs[1414171]: curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30 s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0] s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0] s3fs[1414171]: curl.cpp:RequestPerform(2289): connecting to URL https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855] s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855] s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com s3fs[1414171]: HTTP response code 200 s3fs[1414171]: HTTP response code 200 s3fs[1414171]: curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx] s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx] s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4] s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4]
Author
Owner

@ggtakec commented on GitHub (Aug 27, 2022):

When user operates on a file through s3fs, user will at least download objects from S3.
If user updates the file, an upload will occur.
The upload can be partial if copy api is supported(AWS S3 supports it).

Therefore, the following is not completely impossible.

I understand that the S3 is an object storage solution and it is not possible to edit the objects on the fly and save them.

If multipart upload and copy api are supported, partially uploading will be performed if a flush system call to the file occurs while editing the file without all of file downloads/uploads.

And the temporary file for upload is created in the following location:
If the use_cache option is specified, a temporary file will be created under the specified directory.
If the use_cache option is not specified, it will be created under the directory specified by the tmpdir option(default /tmp).

<!-- gh-comment-id:1229167308 --> @ggtakec commented on GitHub (Aug 27, 2022): When user operates on a file through s3fs, user will at least download objects from S3. If user updates the file, an upload will occur. The upload can be partial if `copy api` is supported(AWS S3 supports it). Therefore, the following is not completely impossible. > I understand that the S3 is an object storage solution and it is not possible to edit the objects on the fly and save them. If `multipart upload` and `copy api` are supported, partially uploading will be performed if a flush system call to the file occurs while editing the file without all of file downloads/uploads. And the temporary file for upload is created in the following location: If the `use_cache` option is specified, a temporary file will be created under the specified directory. If the `use_cache` option is **not** specified, it will be created under the directory specified by the `tmpdir` option(default `/tmp`).
Author
Owner

@paulmueller commented on GitHub (Jan 17, 2024):

👋 We ran into a similar issue with a read-only mount. I wonder whether using the direct_io mount option makes things work as expected?

Our use case example:

s3fs bucket_name ~/s3fs_mount -o url=https://custom.provider.example.com/ -o use_path_request_style -o mp_umask=0222 -o umask=0222 -o allow_other -o direct_io
<!-- gh-comment-id:1895993190 --> @paulmueller commented on GitHub (Jan 17, 2024): :wave: We ran into a similar issue with a read-only mount. I wonder whether using the `direct_io` mount option makes things work as expected? Our use case example: ``` s3fs bucket_name ~/s3fs_mount -o url=https://custom.provider.example.com/ -o use_path_request_style -o mp_umask=0222 -o umask=0222 -o allow_other -o direct_io ```
Author
Owner

@ggtakec commented on GitHub (Feb 12, 2024):

When reading a file, the object is also temporarily downloaded, so a local file is created.
The file will be created in the location specified by the use_cache option, or as a temporary file if this is not specified.
In either case, a local disk will be used.

<!-- gh-comment-id:1938328075 --> @ggtakec commented on GitHub (Feb 12, 2024): When reading a file, the object is also temporarily downloaded, so a local file is created. The file will be created in the location specified by the use_cache option, or as a temporary file if this is not specified. In either case, a local disk will be used.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1014
No description provided.