mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #632] Prevent s3fs from downloading files from bucket? #358
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#358
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gavinrobertszeco on GitHub (Aug 17, 2017).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/632
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.82(commit:fa8c417) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse)
2.9.4
System information (uname -a)
Linux zecoenergy.com 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Distro (cat /etc/issue)
Amazon Linux AMI release 2014.09
s3fs command line used (if applicable)
Details about issue
Everything works great as is, however I am keen on using s3fs to literally act as a proxy between my FTP clients and my S3 Bucket(s).
Because of the VAST amount of data we receive and push to S3 currently, it's not possible or even needed to download any of the existing files.
Is it possible to configure s3fs to simply upload files that are dropped into the mounted folder?
Thanks :)
@gavinrobertszeco commented on GitHub (Aug 29, 2017):
I've tried to create an IAM User role that only has write access but it fails to mount due to the 403 error when attempting to get the list of available files.
Any suggestions?
@ggtakec commented on GitHub (Sep 17, 2017):
The use_cache option is not specified for not using local disk for file cache.
And I think that IAM user role probably has to have list object for that s3fs could list file.
Regards,
@gavinrobertszeco commented on GitHub (Sep 17, 2017):
Hey @ggtakec
It's more preventing S3FS from downloading the contents of the bucket when mounting.
If S3FS simply caches the file listings for LS until you request the file in some form, then it may not be an issue. Otherwise, I'll have to use a long polling method, i.e. cron + awscli.
Any suggestions?
@ggtakec commented on GitHub (Sep 17, 2017):
@gavinrobertszeco
s3fs has a stat (file access permission) cache of the file.
However, s3fs is part of the file system and s3fs does not prevent file listing behavior.
This cache is used to check the permissions of individual files.
Increasing the size of this cache will improve performance.
You can improve the performance of s3fs using options such as max_stat_cache_size, enable_noobj_cache, notsup_compat_dir etc.
Please try it and see man page(and wiki page).
Regards,
@sqlbot commented on GitHub (Sep 18, 2017):
@gavinrobertszeco define "contents of the bucket." S3FS doesn't download any objects unless you open the corresponding files for the objects. Directory listings iterate the objects and sends
HEADrequests to get the permissions, mode (and timestamps?)... but the "contents" (to me, sounds like you mean the object body payloads) are not actually downloaded.@gavinrobertszeco commented on GitHub (Sep 18, 2017):
Hey all,
@sqlbot - My S3 bucket contains GB's of CSV files across hundreds/thousands of folders.
I am using S3FS to simply connect our FTP Server directly to S3 so that Lambda can start processing files received. If it doesn't work, the only alternative would be to use
aws s3 mvto move files received to S3.The OS "should" never need to list or access the mounted folder or it's contents other than to facilitate the upload of the content, however I wasn't sure and I didn't want to try it to only find it's downloaded GB's of data 😦
Based on what you've said, it should all be OK.
Cheers 😄
@machty commented on GitHub (Jan 9, 2018):
@sqlbot that is very interesting about deferring downloads until the file is actually open (sounds FAQ-worthy).
I still think there's some value in there being a unidirectional sync where SFTP uploads go to S3 but don't detect new files on S3, since even if you defer downloads there's still the potential of a lot of metadata coming back when lots of objects are created/copied/etc on S3, and unfortunately this effect is multiplied when you try and provide an elastic cluster of SFTP "proxies" (they'll all receive the metadata sync from all the changes made by other nodes in the cluster). Of course, a unidirectional sync might be beyond the scope of this project, but I think a lot of people are using s3fs for this similar use case.
@sqlbot commented on GitHub (Jan 10, 2018):
s3fs mimics filesystem semantics, so -- just as when you mount a physical disk on a server, there's nothing that causes the data in the files to be read from the disk because there is no reason for the files to be read from the disk unless you actually open them -- the actual objects are never downloaded from S3 unless you actually do something that would cause the content of a file to be read.
There's no metadata syncing that occurs -- s3fs never learns anything spontaneously from S3, it only asks for what it needs to know, right now, based on some user action on the server -- not activity occurring against the bucket. File metadata is fetched when you list a directory (and the information isn't in the stat cache, if configured, as a result of a previous listing), and object payload is fetched when you read from a file, but not before that.
This can be confirmed via the bucket logs.
@gaul commented on GitHub (Jul 11, 2019):
s3fs does not download file unless an application opens and read from them. However,
updatedbor similar background task can unexpectedly read files. Closing since reporter understands this behavior in https://github.com/s3fs-fuse/s3fs-fuse/issues/632#issuecomment-330343636.