mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #2020] Local disk usage spikes while using s3 via s3fs #1014
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1014
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @infrasmeworld on GitHub (Aug 10, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2020
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
Version : 2.9.7
Release : 15.el8
Kernel information (uname -r)
4.18.0-147.5.1.el8_1.x86_64
GNU/Linux Distribution, if applicable (cat /etc/os-release)
Red Hat Enterprise Linux release 8.1 (Ootpa)
s3fs command line used, if applicable
/etc/fstab entry, if applicable
s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
Details about issue
We have mounted S3 bucket using s3fs in our EC2 instances. This bucket is used as a staging area for data pipelines we are running.
We are downloading zip files to this staging area, unzipping it before proceeding with the next action, and performing additional read/write operations against these download files in S3.
We see a spike in local FS usage while executing the above pipeline. Even though we mounted the S3 using s3fs and performed operations against S3 files, will it still be downloaded to the local file system? The disk spikes are temporary while performing operations against S3 objects only. Any thoughts on this behavior? We did multiple tests like tracking system calls to see if it is writing to the local FS but couldn't find any leads. I understand that the S3 is an object storage solution and it is not possible to edit the objects on the fly and save them. If there is a download requirement present for modifying objects, what is the default location of such temporary storage?
+++++++++++++
s3fs[1414171]: s3fs.cpp:check_parent_object_access(627): [path=/.hrishi_tes.swpx]
s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/]
s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0]
s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/]
s3fs[1414171]: s3fs.cpp:check_object_access(519): [path=/.hrishi_tes.swpx]
s3fs[1414171]: s3fs.cpp:check_object_access(524): [pid=1421095,uid=0,gid=0]
s3fs[1414171]: s3fs.cpp:get_object_attribute(350): [path=/.hrishi_tes.swpx]
s3fs[1414171]: cache.cpp:GetStat(266): stat cache hit [path=/.hrishi_tes.swpx][time=1213909.366475541][hit count=1]
s3fs[1414171]: fdcache.cpp:GetExistFdEntity(629): [path=/.hrishi_tes.swpx][pseudo_fd=4]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: [tpath=][path=/.hrishi_tes.swpx][pseudo_fd=4][physical_fd=26]
s3fs[1414171]: fdcache_entity.cpp:Load(1015): [path=/.hrishi_tes.swpx][physical_fd=26][offset=0][size=0]
s3fs[1414171]: [tpath=/.hrishi_tes.swpx]
s3fs[1414171]: [tpath=/.hrishi_tes.swpx]
s3fs[1414171]: curl_handlerpool.cpp:GetHandler(81): Get handler from pool: rest = 30
s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx
s3fs[1414171]: URL is https://s3-eu-west-1.amazonaws.com/s3-dev-data-staging/.hrishi_tes.swpx
s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: URL changed is https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0]
s3fs[1414171]: uploading... [path=/.hrishi_tes.swpx][fd=26][size=0]
s3fs[1414171]: curl.cpp:RequestPerform(2289): connecting to URL https://s3-dev-data-staging.s3-eu-west-1.amazonaws.com/.hrishi_tes.swpx
s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]
s3fs[1414171]: computing signature [PUT] [/.hrishi_tes.swpx] [] [e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855]
s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com
s3fs[1414171]: url is https://s3-eu-west-1.amazonaws.com
s3fs[1414171]: HTTP response code 200
s3fs[1414171]: HTTP response code 200
s3fs[1414171]: curl_handlerpool.cpp:ReturnHandler(103): Return handler to pool
s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx]
s3fs[1414171]: delete stat cache entry[path=/.hrishi_tes.swpx]
s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4]
s3fs[1414171]: [path=/.hrishi_tes.swpx][pseudo_fd=4]
@ggtakec commented on GitHub (Aug 27, 2022):
When user operates on a file through s3fs, user will at least download objects from S3.
If user updates the file, an upload will occur.
The upload can be partial if
copy apiis supported(AWS S3 supports it).Therefore, the following is not completely impossible.
If
multipart uploadandcopy apiare supported, partially uploading will be performed if a flush system call to the file occurs while editing the file without all of file downloads/uploads.And the temporary file for upload is created in the following location:
If the
use_cacheoption is specified, a temporary file will be created under the specified directory.If the
use_cacheoption is not specified, it will be created under the directory specified by thetmpdiroption(default/tmp).@paulmueller commented on GitHub (Jan 17, 2024):
👋 We ran into a similar issue with a read-only mount. I wonder whether using the
direct_iomount option makes things work as expected?Our use case example:
@ggtakec commented on GitHub (Feb 12, 2024):
When reading a file, the object is also temporarily downloaded, so a local file is created.
The file will be created in the location specified by the use_cache option, or as a temporary file if this is not specified.
In either case, a local disk will be used.