mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #2025] doesn't work properly under versioning enabled #1020
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1020
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @garenchan on GitHub (Aug 26, 2022).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2025
Additional Information
The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD
Version of s3fs being used (s3fs --version)
Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL
Copyright (C) 2010 Randy Rizun rrizun@gmail.com
License GPL2: GNU GPL version 2 https://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)
Name : fuse
Version : 2.9.2
Release : 11.el7
Architecture: x86_64
Install Date: 2022年07月12日 星期二 09时39分58秒
Group : System Environment/Base
Size : 223297
License : GPL+
Signature : RSA/SHA256, 2018年11月12日 星期一 22时25分34秒, Key ID 24c6a8a7f4a80eb5
Source RPM : fuse-2.9.2-11.el7.src.rpm
Build Date : 2018年10月31日 星期三 05时32分35秒
Build Host : x86-01.bsys.centos.org
Relocations : (not relocatable)
Packager : CentOS BuildSystem http://bugs.centos.org
Vendor : CentOS
URL : https://github.com/libfuse/libfuse
Summary : File System in Userspace (FUSE) utilities
Description :
With FUSE it is possible to implement a fully functional filesystem in a
userspace program. This package contains the FUSE userspace tools to
mount a FUSE filesystem.
Kernel information (uname -r)
3.10.0-1127.el7.x86_64
GNU/Linux Distribution, if applicable (cat /etc/os-release)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
s3fs command line used, if applicable
/etc/fstab entry, if applicable
s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)
if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages
Details about issue
I deployed a minio cluster
http://10.134.80.223:9001and created a buckettest-bucketwith versioning enabled.I then mount the bucket to directory
/mnt/test2/throughS3fs.I create a directory
parent-dir/sub-dirin/mnt/test2/.Finally, I failed to force the deletion of the directory
parent-dir, prompting that the directory is not empty.I also wonder why we can see deleted directories even though we can't access them.
I look forward to your help.
@ggtakec commented on GitHub (Aug 27, 2022):
It seems that the reason is that sub-dir cannot be deleted when parent-dir is deleted.
After a failed delete, the referenced
sub-dirdoes not appear to be an object namedsub-dir/that s3fs recognizes.At this time:
(Which is the object
sub-dirorsub-dir/?)x-amz-****?If you run s3fs with the
compat_diroption, will there be any changes?(Although it's not a clear answer) I think the reason for this is that the object
sub-diralready existed before creatingparent-dir/sub-dir.Please delete the target object from the minio side once, not from s3fs, and try the same thing again.
I hope you don't get the same error.
@garenchan commented on GitHub (Aug 27, 2022):
I think the problem might be caused by
directory_emptyfunction.When bucket versioning is enabled, the
ListBucketAPI will return deleted subdirectories. As shown below,parent/sub/has actually been deleted. So thedirectory_emptyfunction mistakenly determines that theparentdirectory is not empty.If we try to get the attributes of the
parent/sub/directory, we will get a 404 error.@ggtakec commented on GitHub (Aug 27, 2022):
I think this seems like an issue with minio's versioning and listbucket api.
We should check if minio's ListBucket returns the deleted object.
If the ListBucket result contains deleted objects, then this issue will occur and we need to find a way around it.
@garenchan commented on GitHub (Aug 27, 2022):
I think it might make sense to return deleted subdirectories. If the
ListBucketAPI does not return deleted subdirectories, we may not be able to retrieve or restore deleted objects in them.I have tested
rclone mountand it worked well.Do you have any plans to deal with this problem?
@ggtakec commented on GitHub (Aug 28, 2022):
Getting the result of listbucket which deleted(past versioned) objects in ListBucket is problematic for s3fs.
If the listbucket response behaves as if the object exists even though it has been deleted, the current situation is that s3fs cannot handle it.
Deletion of files/directories(objects) via FUSE is split into sequential system calls and calling to s3fs.
The directory deletion is lastest called after deleting objects under that directory.
At this point, the deletion of the directory fails because it finds that the objects it has deleted just before exist.
In minio, if information with a deletion mark etc. is attached in the response of each object, s3fs may be possible to deal with it by excluding them.
However, the processing will be quite special for us.
@garenchan commented on GitHub (Aug 28, 2022):
@ggtakec.
I tried to fix the problem and it worked fine after testing.
If you have time, you can take a look at the code. Although it is rough, I hope it can help you.
https://github.com/garenchan/s3fs-fuse/pull/1
@creeew commented on GitHub (Aug 30, 2022):
Hi, @ggtakec We have the same issue. Deleted dir still exist if using minio version which is really bother us. Really appreciate you guys solving this problem.
@ggtakec commented on GitHub (Aug 30, 2022):
@garenchan
I checked your modified code and understood it.
(Please correct me if I'm wrong)
I understood that
miniolisted the objects that should have been deleted in theListBucketresponse and However HEAD requests for that object would result in an error.(This spec is correct, isn't this?)
About your modified code:
I think that the stats caches for deleted directory objects should not need to be deleted again.
This behavior seems unnecessary as s3fs removes the information about the object from the stat cache immediately after deleting it.
Also, I think not calling
filleron directory objects will have other problems.You work around this problem by doing almost the same thing as
readdirwithdirectory_empty, but I think this will create new performance problemsI believe this issues's problem is that
ListBucketresponse lists objects that has been deleted.Therefore, s3fs cannot distinguish whether an object that should have been deleted remains or actually exists.
(By the response of the Head request became
ENOENTdue to overbearing judgment, the method to solve this problem may affect compatibility with other distributed object storages and old s3fs.)Is there a way to prevent the deleted object from being included in the
ListBucketresponse when versioning is enabled according to theminiospecification?(ex. options, parameters, etc. )
Without resolving this root cause, fixing the s3fs side will create new problems in terms of compatibility with other distributed object stores.
@garenchan commented on GitHub (Aug 31, 2022):
@ggtakec
Yes.
My code changes include the following:
directory_emptyfunction, we need to check whetherCommonPrefixesexists. In general, we may only need to call theget_object_attributefunction once more. The impact on performance may be small.s3fs_readdirfunction, we do not use stat cache forCommonPrefixes. Because they might have been deleted. In addition,fillershould not be applied to not found objects.I may have missed something else.
@ggtakec commented on GitHub (Aug 31, 2022):
Missing objects may exist.
This is a case where s3fs considers an object that lives flat in storage to be a path containing a directory by its object name.
There are cases where a directory that appears to exist in between does not exist.
(This case exists in cases such as when objects are uploaded using only the API)
s3fs will have to call a
fillerto handle these objects as well.@garenchan commented on GitHub (Aug 31, 2022):
Yes, you are right. This is a difficult problem.
@marcinkuk commented on GitHub (Sep 1, 2022):
I can help with tests.
@ggtakec commented on GitHub (Sep 2, 2022):
I found the following Issue:
https://github.com/minio/minio/issues/10914
After all, it seems that the directory is listed as if it was not deleted in the bucket when versioning enabled.
The only way around this problem seems to be to do a HEAD(or GET) request to all objects listed in
directory_empty(similar to garenchan's piece of code).However, it would be very slow in performance.
Also, the fact that
ListBucketlists a deleted directory will make it appear that the directory has been revived in the same way by another listing call, even if you have modifieddirectory_empty.The only way around this is to not call the
filler(that way is as same as garenchan's modified code).But I think that would cause another problem.
I need to think about this issue a little more.
@marcinkuk commented on GitHub (Sep 2, 2022):
I think it is better to have funcionality with poor performance oposite to no functionality.
In future performance can be achieved with some solution.
In order not to affect current performance it could be accesible with mount option for example "lazy_delete".
@ggtakec commented on GitHub (Sep 3, 2022):
I have prepared the fixed code for testing.(This is similar to garenchan's code.)
https://github.com/ggtakec/s3fs-fuse/tree/minio_baseof_no_dir_obj_listing
The explanation is long, but please read it.
I currently post a PR #2023 which fixes a bug in cases where the directory object does not exist.
It has to do with the
compat_diroption.And processing of
filleretc. that is discussed in this issue has been changed from v1.91.The test code which I created is based on this #2023.
You should be aware of the differences between
v1.91andmastercode.The current master code differs from
v1.91in thealternative directory namesoption (and its default value).The
nosup_compat_diroption has been deprecated in favor ofcompat_dir. (Thealternative directory namesdefault has been changed from enable to disable.)The current
mastercode disablesalternative directory namesby default.At the time of this issue (inappropriate behavior of s3fs found by garenchan), probably s3fs was running with
alternative directory namesenabled.However, what garenchan did is creating and deleting directories via s3fs.
In other words, this works even if
alternative directory namesis disabled.The code for testing adds new option called
strict_dir_empty.Start with this option when testing.
The
strict_dir_emptyoption causes the list of objects received inListBucketto be inspected separately onHEADrequests, similar toreaddir.And treat any object that receives a
404as a non-existent object.I believe this will allow the directory deletion in this issue's instructions to work even with MinIO's versioned buckets.
In addition, please do not start by adding
compat_dir(specifying with both is deprecated).I tested the same situation manually and the delete worked fine.(not using MinIO)
Could you build and test this test code for MinIO?
@creeew commented on GitHub (Sep 5, 2022):
@ggtakec Thank you for your good work. There's no deleted dir in s3fs when minio enable version.
But there are missing issues maybe have to bother you to fix again.
In s3fs cached situation, multi server mount the same bucket:
Mount directory will cache the dir stat even the dir was deleted by other mount.
reproduce:
(we got two mount point named: mnt1 and mnt2)
in mnt1, create a dir with sub dir: mkdir -p /mnt1/a/b
in mnt2, delete the subdir b, rm -fr /mnt2/a/b
back to mnt1, try to create sub dir a/b again since b was deleted, we suppose create dir b with no error, but the result is:
mkdir: cannot create directory ‘a/b’: File exists(this issue was not cause by your this patch)
Directory uploaded by minio web console but the dir cannot find in s3fs mount point.
@garenchan commented on GitHub (Sep 5, 2022):
Yes, my test results were the same as @creeew‘s.
We may not be able to use stat cachie for directories here. The directories do not have an Etag at this point and may have been deleted.
github.com/ggtakec/s3fs-fuse@000140ae4e/src/s3fs.cpp (L2779-L2786)@ggtakec commented on GitHub (Sep 6, 2022):
@creeew @garenchan
The stat cache of s3fs has its size and expiration time.
I think this problem occurs because the file stat will not be acquired to the S3 server during the expiration period.
If you can allow to change it, you can adjust the expiration with the
stat_cache_expireoption, etc.(This issue should be a separate issue unrelated to this issue and MinIO)
This is probably solved with the
nosup_compat_diroption (orcompat_diroption in the master branch).Note those option defaults are reversed between v1.91 and master.
I think that you can't see that directory when the
compat_dirdisabled.@tmfksoft commented on GitHub (Jul 17, 2023):
Is there any news on this?
I'm running into similar issues where I'm seeing directories that have been deleted along with their contents when versioning is enabled.
@marcinkuk commented on GitHub (Jul 18, 2023):
Me too.
Did you try v1.92?
@adamqqqplay commented on GitHub (Aug 17, 2023):
Maybe use
-o listobjectsv2option could solve versioning related problem.