[GH-ISSUE #632] Prevent s3fs from downloading files from bucket?

kerem commented

2026-03-04 01:44:44 +03:00

Owner

Originally created by @gavinrobertszeco on GitHub (Aug 17, 2017).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/632

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.

Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.82(commit:fa8c417) with OpenSSL
Version of fuse being used (pkg-config --modversion fuse)
2.9.4
System information (uname -a)
Linux zecoenergy.com 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Distro (cat /etc/issue)
Amazon Linux AMI release 2014.09
s3fs command line used (if applicable)

s3fs s3ftp-test /home/mount_point -o passwd_file=/home/passwd.s3

Details about issue

Everything works great as is, however I am keen on using s3fs to literally act as a proxy between my FTP clients and my S3 Bucket(s).

Because of the VAST amount of data we receive and push to S3 currently, it's not possible or even needed to download any of the existing files.

Is it possible to configure s3fs to simply upload files that are dropped into the mounted folder?

Thanks :)

Originally created by @gavinrobertszeco on GitHub (Aug 17, 2017). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/632 #### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ - Version of s3fs being used (s3fs --version) - Amazon Simple Storage Service File System V1.82(commit:fa8c417) with OpenSSL - Version of fuse being used (pkg-config --modversion fuse) - 2.9.4 - System information (uname -a) - Linux zecoenergy.com 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux - Distro (cat /etc/issue) - Amazon Linux AMI release 2014.09 - s3fs command line used (if applicable) ``` s3fs s3ftp-test /home/mount_point -o passwd_file=/home/passwd.s3 ``` #### Details about issue Everything works great as is, however I am keen on using s3fs to literally act as a proxy between my FTP clients and my S3 Bucket(s). Because of the VAST amount of data we receive and push to S3 currently, it's not possible or even needed to download any of the existing files. Is it possible to configure s3fs to simply upload files that are dropped into the mounted folder? Thanks :)

kerem closed this issue

2026-03-04 01:44:44 +03:00

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@gavinrobertszeco commented on GitHub (Aug 29, 2017):

I've tried to create an IAM User role that only has write access but it fails to mount due to the 403 error when attempting to get the list of available files.

Any suggestions?

@gavinrobertszeco commented on GitHub (Aug 29, 2017): I've tried to create an IAM User role that only has write access but it fails to mount due to the 403 error when attempting to get the list of available files. Any suggestions?

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@ggtakec commented on GitHub (Sep 17, 2017):

The use_cache option is not specified for not using local disk for file cache.
And I think that IAM user role probably has to have list object for that s3fs could list file.
Regards,

@ggtakec commented on GitHub (Sep 17, 2017): The use_cache option is not specified for not using local disk for file cache. And I think that IAM user role probably has to have list object for that s3fs could list file. Regards,

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@gavinrobertszeco commented on GitHub (Sep 17, 2017):

Hey @ggtakec

It's more preventing S3FS from downloading the contents of the bucket when mounting.

If S3FS simply caches the file listings for LS until you request the file in some form, then it may not be an issue. Otherwise, I'll have to use a long polling method, i.e. cron + awscli.

Any suggestions?

@gavinrobertszeco commented on GitHub (Sep 17, 2017): Hey @ggtakec It's more preventing S3FS from downloading the contents of the bucket when mounting. If S3FS simply caches the file listings for LS until you request the file in some form, then it may not be an issue. Otherwise, I'll have to use a long polling method, i.e. cron + awscli. Any suggestions?

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@ggtakec commented on GitHub (Sep 17, 2017):

@gavinrobertszeco
s3fs has a stat (file access permission) cache of the file.
However, s3fs is part of the file system and s3fs does not prevent file listing behavior.
This cache is used to check the permissions of individual files.
Increasing the size of this cache will improve performance.

You can improve the performance of s3fs using options such as max_stat_cache_size, enable_noobj_cache, notsup_compat_dir etc.
Please try it and see man page(and wiki page).

Regards,

@ggtakec commented on GitHub (Sep 17, 2017): @gavinrobertszeco s3fs has a stat (file access permission) cache of the file. However, s3fs is part of the file system and s3fs does not prevent file listing behavior. This cache is used to check the permissions of individual files. Increasing the size of this cache will improve performance. You can improve the performance of s3fs using options such as max_stat_cache_size, enable_noobj_cache, notsup_compat_dir etc. Please try it and see man page(and wiki page). Regards,

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@sqlbot commented on GitHub (Sep 18, 2017):

It's more preventing S3FS from downloading the contents of the bucket when mounting.

@gavinrobertszeco define "contents of the bucket." S3FS doesn't download any objects unless you open the corresponding files for the objects. Directory listings iterate the objects and sends HEAD requests to get the permissions, mode (and timestamps?)... but the "contents" (to me, sounds like you mean the object body payloads) are not actually downloaded.

@sqlbot commented on GitHub (Sep 18, 2017): >It's more preventing S3FS from downloading the contents of the bucket when mounting. @gavinrobertszeco define "contents of the bucket." S3FS doesn't download any objects unless you open the corresponding files for the objects. Directory listings iterate the objects and sends `HEAD` requests to get the permissions, mode (and timestamps?)... but the "contents" (to me, sounds like you mean the object body payloads) are not actually downloaded.

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@gavinrobertszeco commented on GitHub (Sep 18, 2017):

Hey all,

@sqlbot - My S3 bucket contains GB's of CSV files across hundreds/thousands of folders.

I am using S3FS to simply connect our FTP Server directly to S3 so that Lambda can start processing files received. If it doesn't work, the only alternative would be to use aws s3 mv to move files received to S3.

The OS "should" never need to list or access the mounted folder or it's contents other than to facilitate the upload of the content, however I wasn't sure and I didn't want to try it to only find it's downloaded GB's of data 😦

Based on what you've said, it should all be OK.

Cheers 😄

@gavinrobertszeco commented on GitHub (Sep 18, 2017): Hey all, @sqlbot - My S3 bucket contains GB's of CSV files across hundreds/thousands of folders. I am using S3FS to simply connect our FTP Server directly to S3 so that Lambda can start processing files received. If it doesn't work, the only alternative would be to use `aws s3 mv` to move files received to S3. The OS "should" never need to list or access the mounted folder or it's contents other than to facilitate the upload of the content, however I wasn't sure and I didn't want to try it to only find it's downloaded GB's of data 😦 Based on what you've said, it should all be OK. Cheers :smile:

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@machty commented on GitHub (Jan 9, 2018):

@sqlbot that is very interesting about deferring downloads until the file is actually open (sounds FAQ-worthy).

I still think there's some value in there being a unidirectional sync where SFTP uploads go to S3 but don't detect new files on S3, since even if you defer downloads there's still the potential of a lot of metadata coming back when lots of objects are created/copied/etc on S3, and unfortunately this effect is multiplied when you try and provide an elastic cluster of SFTP "proxies" (they'll all receive the metadata sync from all the changes made by other nodes in the cluster). Of course, a unidirectional sync might be beyond the scope of this project, but I think a lot of people are using s3fs for this similar use case.

@machty commented on GitHub (Jan 9, 2018): @sqlbot that is very interesting about deferring downloads until the file is actually open (sounds FAQ-worthy). I still think there's some value in there being a unidirectional sync where SFTP uploads go to S3 but don't detect new files on S3, since even if you defer downloads there's still the potential of a lot of metadata coming back when lots of objects are created/copied/etc on S3, and unfortunately this effect is multiplied when you try and provide an elastic cluster of SFTP "proxies" (they'll all receive the metadata sync from all the changes made by other nodes in the cluster). Of course, a unidirectional sync might be beyond the scope of this project, but I think a lot of people are using s3fs for this similar use case.

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@sqlbot commented on GitHub (Jan 10, 2018):

s3fs mimics filesystem semantics, so -- just as when you mount a physical disk on a server, there's nothing that causes the data in the files to be read from the disk because there is no reason for the files to be read from the disk unless you actually open them -- the actual objects are never downloaded from S3 unless you actually do something that would cause the content of a file to be read.

There's no metadata syncing that occurs -- s3fs never learns anything spontaneously from S3, it only asks for what it needs to know, right now, based on some user action on the server -- not activity occurring against the bucket. File metadata is fetched when you list a directory (and the information isn't in the stat cache, if configured, as a result of a previous listing), and object payload is fetched when you read from a file, but not before that.

This can be confirmed via the bucket logs.

@sqlbot commented on GitHub (Jan 10, 2018): s3fs mimics filesystem semantics, so -- just as when you mount a physical disk on a server, there's nothing that causes the data in the files to be read from the disk because there is no reason for the files to be read from the disk unless you actually open them -- the actual objects are never downloaded from S3 unless you actually do something that would cause the content of a file to be read. There's no metadata syncing that occurs -- s3fs never learns anything spontaneously from S3, it only asks for what it needs to know, *right now*, based on some user action on the server -- not activity occurring against the bucket. File metadata is fetched when you list a directory (and the information isn't in the stat cache, if configured, as a result of a previous listing), and object payload is fetched when you read from a file, but not before that. This can be confirmed via the bucket logs.

kerem commented

2026-03-04 01:44:45 +03:00

Author

Owner

@gaul commented on GitHub (Jul 11, 2019):

s3fs does not download file unless an application opens and read from them. However, updatedb or similar background task can unexpectedly read files. Closing since reporter understands this behavior in https://github.com/s3fs-fuse/s3fs-fuse/issues/632#issuecomment-330343636.

@gaul commented on GitHub (Jul 11, 2019): s3fs does not download file unless an application opens and read from them. However, `updatedb` or similar background task can unexpectedly read files. Closing since reporter understands this behavior in https://github.com/s3fs-fuse/s3fs-fuse/issues/632#issuecomment-330343636.

Rows
Columns

[GH-ISSUE #632] Prevent s3fs from downloading files from bucket? #358

Additional Information

Details about issue