[GH-ISSUE #1578] Server side encrypting leads to auto overwriting #826

Closed
opened 2026-03-04 01:49:07 +03:00 by kerem · 11 comments
Owner

Originally created by @DasMagischeToastbrot on GitHub (Feb 17, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1578

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

Using commit: c692093

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

Version: 2.9.4

Kernel information (uname -r)

Linux

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"

s3fs command line used, if applicable

s3fs#$PATH /mnt/home/ fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/,ensure_diskfree=5000,uid=myuid,gid=sshusers,umask=002 0 0

/etc/fstab entry, if applicable

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages

Details about issue

If I use two s3 datasets with kms server side encryption and I have a file with the same name in both datasets (for example: app.R) the caching leads to the fact that the first script always overwrites the second one. With "normal" s3 datasets, normal means without kms server side encryption, this doesn't occur. Can you please help here? This is quiet confusing

Originally created by @DasMagischeToastbrot on GitHub (Feb 17, 2021). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1578 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) Using commit: c692093 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) Version: 2.9.4 #### Kernel information (uname -r) Linux #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" #### s3fs command line used, if applicable ``` s3fs#$PATH /mnt/home/ fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/,ensure_diskfree=5000,uid=myuid,gid=sshusers,umask=002 0 0 ``` #### /etc/fstab entry, if applicable ``` ``` #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) _if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_ ``` ``` ### Details about issue If I use two s3 datasets with kms server side encryption and I have a file with the same name in both datasets (for example: app.R) the caching leads to the fact that the first script always overwrites the second one. With "normal" s3 datasets, normal means without kms server side encryption, this doesn't occur. Can you please help here? This is quiet confusing
kerem 2026-03-04 01:49:07 +03:00
Author
Owner

@gaul commented on GitHub (Feb 19, 2021):

Could you provide concrete steps on how to trigger these symptoms, e.g., which operation fails with SSE that succeeds without it?

<!-- gh-comment-id:781817730 --> @gaul commented on GitHub (Feb 19, 2021): Could you provide concrete steps on how to trigger these symptoms, e.g., which operation fails with SSE that succeeds without it?
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 19, 2021):

Step 1: Start s3fs with two s3 datasets with kms server side encryption
Step 2: Add a file with the same name in both of these datasets, let's call it test_file.R and make sure these two files are different
Step 3: Go to the first dataset, open test_file.R, for example with nano, so ǹano test_file.R and change anything in the file
Step 4: Go to the second dataset, open test_file.R, for example with nano, so nano test_file.Rand you will see that the changings you did in the first file are in the second file as well and got auto writte to s3, but this isn't intended from our side

I can disable this "bug", if I disable the caching, but this isn't what I would like to do. Furthermore this behaviour happens only with s3 datasets with server side encryption. Within this encryption everything works fine.

<!-- gh-comment-id:781830853 --> @DasMagischeToastbrot commented on GitHub (Feb 19, 2021): Step 1: Start s3fs with two s3 datasets with kms server side encryption Step 2: Add a file with the same name in both of these datasets, let's call it test_file.R and make sure these two files are different Step 3: Go to the first dataset, open test_file.R, for example with nano, so ```ǹano test_file.R``` and change anything in the file Step 4: Go to the second dataset, open test_file.R, for example with nano, so ```nano test_file.R```and you will see that the changings you did in the first file are in the second file as well and got auto writte to s3, but this isn't intended from our side I can disable this "bug", if I disable the caching, but this isn't what I would like to do. Furthermore this behaviour happens only with s3 datasets with server side encryption. Within this encryption everything works fine.
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 22, 2021):

@gaul did you have time to check whether this is enough information?

<!-- gh-comment-id:783442332 --> @DasMagischeToastbrot commented on GitHub (Feb 22, 2021): @gaul did you have time to check whether this is enough information?
Author
Owner

@gaul commented on GitHub (Feb 23, 2021):

@DasMagischeToastbrot Could you test with the latest master which includes #1579 that may address your symptoms? If this does not help, can you try to further minimize a test case, perhaps using the AWS CLI to do the modification and s3fs to read it, or vice versa? Narrowing down the symptoms makes it much easier to diagnose any underlying issue.

<!-- gh-comment-id:783877217 --> @gaul commented on GitHub (Feb 23, 2021): @DasMagischeToastbrot Could you test with the latest master which includes #1579 that may address your symptoms? If this does not help, can you try to further minimize a test case, perhaps using the AWS CLI to do the modification and s3fs to read it, or vice versa? Narrowing down the symptoms makes it much easier to diagnose any underlying issue.
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 23, 2021):

@gaul I tried it with the latest commit, but the problem still exists. To be honest I don't know how to further minimize a test case. The problem already appears at a basic "nano" command if s3fs is enabled with cache.
I don't really know how the current implementation of s3fs is, but it looks like at s3 buckets with server side encryption based on KMS, it forgets the path? So it caches the name, puts it into the same bucket, but it forgets that these files are stored in different datasets. Do you anything about it?

<!-- gh-comment-id:784043734 --> @DasMagischeToastbrot commented on GitHub (Feb 23, 2021): @gaul I tried it with the latest commit, but the problem still exists. To be honest I don't know how to further minimize a test case. The problem already appears at a basic "nano" command if s3fs is enabled with cache. I don't really know how the current implementation of s3fs is, but it looks like at s3 buckets with server side encryption based on KMS, it forgets the path? So it caches the name, puts it into the same bucket, but it forgets that these files are stored in different datasets. Do you anything about it?
Author
Owner

@gaul commented on GitHub (Feb 23, 2021):

nano saving a file is a bad test case since this issues several operations. If you can reproduce the symptoms with simpler operations, e.g., rename via mv, append via echo foo >> filename, this can help isolate the issue.

<!-- gh-comment-id:784130234 --> @gaul commented on GitHub (Feb 23, 2021): `nano` saving a file is a bad test case since this issues several operations. If you can reproduce the symptoms with simpler operations, e.g., rename via `mv`, append via `echo foo >> filename`, this can help isolate the issue.
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 24, 2021):

Hello I could reproduce the problem also at appending: Something that I have to say is that before I did anything the app.R file within YCcM-testconfidentialzwei had the content 'confidential test' and at the end it got overwritten by the app.R file from sTHs-awesometest421. Please have a look at the following commands:

[my_user@ip datasets]$ cd sTHs-awesometest421/
[my_user@ip sTHs-awesometest421]$ cat app.R
my awesome test
[my_user@ip sTHs-awesometest421]$ echo 'ADD THIS THING' >> app.R
[my_user@ip sTHs-awesometest421]$ cat app.R
my awesome test
ADD THIS THING
[my_user@ip sTHs-awesometest421]$ cd ..
[my_user@ip datasets]$ cd YCcM-testconfidentialzwei/
[my_user@ip YCcM-testconfidentialzwei]$ cat app.R
my awesome test
ADD THIS

@gaul is this sufficient for you?

<!-- gh-comment-id:784825689 --> @DasMagischeToastbrot commented on GitHub (Feb 24, 2021): Hello I could reproduce the problem also at appending: Something that I have to say is that before I did anything the app.R file within YCcM-testconfidentialzwei had the content 'confidential test' and at the end it got overwritten by the app.R file from sTHs-awesometest421. Please have a look at the following commands: [my_user@ip datasets]$ cd sTHs-awesometest421/ [my_user@ip sTHs-awesometest421]$ cat app.R my awesome test [my_user@ip sTHs-awesometest421]$ echo 'ADD THIS THING' >> app.R [my_user@ip sTHs-awesometest421]$ cat app.R my awesome test ADD THIS THING [my_user@ip sTHs-awesometest421]$ cd .. [my_user@ip datasets]$ cd YCcM-testconfidentialzwei/ [my_user@ip YCcM-testconfidentialzwei]$ cat app.R my awesome test ADD THIS @gaul is this sufficient for you?
Author
Owner

@gaul commented on GitHub (Feb 24, 2021):

Do both of these mountpoints have the same use_cache directory? Because if so this is not supported and will cause data corruption. Sorry I'm not following your example. You should provide the exact steps to reproduce this symptom, including the mount command and however you created the initial file.

<!-- gh-comment-id:785036451 --> @gaul commented on GitHub (Feb 24, 2021): Do both of these mountpoints have the same `use_cache` directory? Because if so this is not supported and will cause data corruption. Sorry I'm not following your example. You should provide the exact steps to reproduce this symptom, including the mount command and however you created the initial file.
Author
Owner

@DasMagischeToastbrot commented on GitHub (Feb 25, 2021):

Ok alright I understood that it's not supported to save the cache of multiple mountpoints within one cache directory.

Let's say PATH='bucket-name/path-to-anything/suffix' and under suffix there is a file called app.R. If I run the command as above

s3fs#$PATH /mnt/home/ fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/path-to-anything/,ensure_diskfree=5000,uid=myuid,gid=sshusers,umask=002 0 0

why does it save the cache within /tmp/s3fs-cache/path-to-anything/bucket-name/suffix/? I personally would have expected that cache gets saved in /tmp/s3fs-cache/path-to-anything/bucket-name/path-to-anything/suffix/

<!-- gh-comment-id:785650038 --> @DasMagischeToastbrot commented on GitHub (Feb 25, 2021): Ok alright I understood that it's not supported to save the cache of multiple mountpoints within one cache directory. Let's say PATH='bucket-name/path-to-anything/suffix' and under suffix there is a file called app.R. If I run the command as above ``` s3fs#$PATH /mnt/home/ fuse rw,_netdev,allow_other,endpoint=eu-central-1,iam_role=auto,use_cache=/tmp/s3fs-cache/path-to-anything/,ensure_diskfree=5000,uid=myuid,gid=sshusers,umask=002 0 0 ``` why does it save the cache within ```/tmp/s3fs-cache/path-to-anything/bucket-name/suffix/```? I personally would have expected that cache gets saved in ```/tmp/s3fs-cache/path-to-anything/bucket-name/path-to-anything/suffix/```
Author
Owner

@gaul commented on GitHub (Feb 25, 2021):

I understand the confusion but this is what the original author chose in 2008. Maybe you can contribute a change to the man page and help strings to clarify this behavior?

Does specifying separate use_cache directories per bucket/path resolve your symptoms?

<!-- gh-comment-id:785922996 --> @gaul commented on GitHub (Feb 25, 2021): I understand the confusion but this is what the original author chose in 2008. Maybe you can contribute a change to the man page and help strings to clarify this behavior? Does specifying separate `use_cache` directories per bucket/path resolve your symptoms?
Author
Owner

@DasMagischeToastbrot commented on GitHub (Mar 5, 2021):

Yes specifying the different directories solved this problem. Thanks

<!-- gh-comment-id:791175504 --> @DasMagischeToastbrot commented on GitHub (Mar 5, 2021): Yes specifying the different directories solved this problem. Thanks
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#826
No description provided.