mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #1578] Server side encrypting leads to auto overwriting #826
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#826
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @DasMagischeToastbrot on GitHub (Feb 17, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1578
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
Using commit:
c692093Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
Version: 2.9.4
Kernel information (uname -r)
Linux
GNU/Linux Distribution, if applicable (cat /etc/os-release)
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
s3fs command line used, if applicable
/etc/fstab entry, if applicable
s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages
Details about issue
If I use two s3 datasets with kms server side encryption and I have a file with the same name in both datasets (for example: app.R) the caching leads to the fact that the first script always overwrites the second one. With "normal" s3 datasets, normal means without kms server side encryption, this doesn't occur. Can you please help here? This is quiet confusing
@gaul commented on GitHub (Feb 19, 2021):
Could you provide concrete steps on how to trigger these symptoms, e.g., which operation fails with SSE that succeeds without it?
@DasMagischeToastbrot commented on GitHub (Feb 19, 2021):
Step 1: Start s3fs with two s3 datasets with kms server side encryption
Step 2: Add a file with the same name in both of these datasets, let's call it test_file.R and make sure these two files are different
Step 3: Go to the first dataset, open test_file.R, for example with nano, so
ǹano test_file.Rand change anything in the fileStep 4: Go to the second dataset, open test_file.R, for example with nano, so
nano test_file.Rand you will see that the changings you did in the first file are in the second file as well and got auto writte to s3, but this isn't intended from our sideI can disable this "bug", if I disable the caching, but this isn't what I would like to do. Furthermore this behaviour happens only with s3 datasets with server side encryption. Within this encryption everything works fine.
@DasMagischeToastbrot commented on GitHub (Feb 22, 2021):
@gaul did you have time to check whether this is enough information?
@gaul commented on GitHub (Feb 23, 2021):
@DasMagischeToastbrot Could you test with the latest master which includes #1579 that may address your symptoms? If this does not help, can you try to further minimize a test case, perhaps using the AWS CLI to do the modification and s3fs to read it, or vice versa? Narrowing down the symptoms makes it much easier to diagnose any underlying issue.
@DasMagischeToastbrot commented on GitHub (Feb 23, 2021):
@gaul I tried it with the latest commit, but the problem still exists. To be honest I don't know how to further minimize a test case. The problem already appears at a basic "nano" command if s3fs is enabled with cache.
I don't really know how the current implementation of s3fs is, but it looks like at s3 buckets with server side encryption based on KMS, it forgets the path? So it caches the name, puts it into the same bucket, but it forgets that these files are stored in different datasets. Do you anything about it?
@gaul commented on GitHub (Feb 23, 2021):
nanosaving a file is a bad test case since this issues several operations. If you can reproduce the symptoms with simpler operations, e.g., rename viamv, append viaecho foo >> filename, this can help isolate the issue.@DasMagischeToastbrot commented on GitHub (Feb 24, 2021):
Hello I could reproduce the problem also at appending: Something that I have to say is that before I did anything the app.R file within YCcM-testconfidentialzwei had the content 'confidential test' and at the end it got overwritten by the app.R file from sTHs-awesometest421. Please have a look at the following commands:
[my_user@ip datasets]$ cd sTHs-awesometest421/
[my_user@ip sTHs-awesometest421]$ cat app.R
my awesome test
[my_user@ip sTHs-awesometest421]$ echo 'ADD THIS THING' >> app.R
[my_user@ip sTHs-awesometest421]$ cat app.R
my awesome test
ADD THIS THING
[my_user@ip sTHs-awesometest421]$ cd ..
[my_user@ip datasets]$ cd YCcM-testconfidentialzwei/
[my_user@ip YCcM-testconfidentialzwei]$ cat app.R
my awesome test
ADD THIS
@gaul is this sufficient for you?
@gaul commented on GitHub (Feb 24, 2021):
Do both of these mountpoints have the same
use_cachedirectory? Because if so this is not supported and will cause data corruption. Sorry I'm not following your example. You should provide the exact steps to reproduce this symptom, including the mount command and however you created the initial file.@DasMagischeToastbrot commented on GitHub (Feb 25, 2021):
Ok alright I understood that it's not supported to save the cache of multiple mountpoints within one cache directory.
Let's say PATH='bucket-name/path-to-anything/suffix' and under suffix there is a file called app.R. If I run the command as above
why does it save the cache within
/tmp/s3fs-cache/path-to-anything/bucket-name/suffix/? I personally would have expected that cache gets saved in/tmp/s3fs-cache/path-to-anything/bucket-name/path-to-anything/suffix/@gaul commented on GitHub (Feb 25, 2021):
I understand the confusion but this is what the original author chose in 2008. Maybe you can contribute a change to the man page and help strings to clarify this behavior?
Does specifying separate
use_cachedirectories per bucket/path resolve your symptoms?@DasMagischeToastbrot commented on GitHub (Mar 5, 2021):
Yes specifying the different directories solved this problem. Thanks