mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #2099] Massively add S3 Bucket Objects needed Metadata for s3fs-fuse #1067
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1067
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aek on GitHub (Jan 26, 2023).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2099
Version of s3fs being used (
s3fs --version)Amazon Simple Storage Service File System V1.91 (commit:14eb1a7) with OpenSSL
Version of fuse being used (
pkg-config --modversion fuse,rpm -qi fuseordpkg -s fuse)Version: 2.9.4-1ubuntu3.1
Kernel information (
uname -r)5.7.11-mainline-rev1
GNU/Linux Distribution, if applicable (
cat /etc/os-release)NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
How to run s3fs, if applicable
using systemd mount unit like:
Details about issue
I have been using goofys to mount the s3 bucket as a local folder and have already several GBs of files and folders for different projects that works better with s3fs because it's more stable than goofys so I switch back to s3fs.
Everything works fine for applications since they store internally what paths they need to use to retrieve the files locally. I'm using Odoo specifically to talk directly to s3fs local folder and have the filestore stored into the s3 bucket.
The problem appear when I try to create a backup for the filestore and I'm getting an empty filestore since the files originally where created and uploaded by goofys and they don't have the s3fs metadata properly created so a normal list of the filestore in the server s3fs folder shows everything as empty even when the files and folder are there in the s3 bucket and i could use it ok if I directly target the files like using
nano.I need a way to be able to massively set the s3fs metadata for the s3 bucket files to avoid the re-upload of all the files just to be seen by s3fs. I already have a lot of GBs stored in the s3 bucket that needs x-amz metadata headers properly set.
If there any path I could follow to accomplish this task? perhaps a python script using boto3 that I could build to sync the files metadata to the s3 bucket objects.
I will need to know what are the needed metadata and where to get their value.
That anyone have get something like this working? maybe there is an script already built for this
I'm already looking at the s3fs code that produce the metadata in order to try to replicate that same in python to sync the s3 bucket objects to be use it with s3fs.
@michaelsmoody commented on GitHub (Jan 27, 2023):
I suspect that the fix for your issue is contained in some commits that are waiting for a new version (1.92). I'm similarly waiting for the same.
@ggtakec and @gaul ? Any opportunity for a release of 1.92 with the fixes for https://github.com/s3fs-fuse/s3fs-fuse/issues/2033 and https://github.com/s3fs-fuse/s3fs-fuse/issues/2032 which was https://github.com/s3fs-fuse/s3fs-fuse/pull/1964 and https://github.com/s3fs-fuse/s3fs-fuse/pull/2039
I hate to ask again, but it's been ~3 months, and I've got a major blocking issue that I just can't use this until this is pushed out (and I'm 99% certain it would assist this individual's request. Failing that, I'll assist them if the other fixes can get committed please, that way, I'll have what they need)
@aek commented on GitHub (Jan 27, 2023):
@michaelsmoody thanks for pointing out the Issues and PRs.

I have review those and doesn't seems to be related with my issue here. Also I'm building s3fs from sources in order to try out the more latest changes merged into the master branch of the repository(seems that there is only that branch)
My issue is pretty much that all the files and folders that were not created through s3fs-fuse will not be seen in several commands because they don't have the the metadata headers from s3fs like:
If I directly upload the files to the aws console or use goofys(like mostly all my files), boto3 or any other way to upload the files that doesn't attach the metadata needed for s3fs will cause the folders and files to be a little bit hidden to s3fs. If you know the full file key paths you could use it without issues but something like a python
os.walkor bashlscommand won't be able to see the files and folders since they don't have the metadata attached. Also I have seen that running a bashtreecommand discover the files and folders ok and make thelscommand to work but they don't survive a restart of s3fs@ggtakec commented on GitHub (Jan 29, 2023):
@aek
Would it be possible to solve this problem by assigning attributes to objects that do not have attributes handled by s3fs?
Try starting s3fs with the
complement_statandcompat_diroptions (alsoumask,mp_umask, etc.) using the code from the current master branch.First of all, I think this will allow you to see objects(files/dirs) that are hidden.
Then, It is possible to give the attribute by updating the objects that do not have this attribute (probably
Unixtimeis 0 (1970...)).For example, updating a file's timestamp with the
touchcommand.If you're using a server that supports
multipart uploadandCopy API, this operation can only update attributes(x-amz-meta-*headers) instead of uploading the entire file object.(If the number of files is large, a certain amount of cost(time/requests) will be required to update all of them.)
Is it possible to try this method?
@ggtakec commented on GitHub (Jan 29, 2023):
@michaelsmoody
Sorry to keep you waiting.
We are still reviewing uninvestigated bug reports and fixing them, which is taking time.
However, we also maintain the desire to release early.
(I will create Issue #2102 and adjust it there.)
@aek commented on GitHub (Jan 29, 2023):
@ggtakec thank you so much. That config option
compat_dirindeed helps to solve the issue. Here is my systemd unit that allow me to see the hidden folders.I have seen and tried
complement_statbefore but wasn't aware of thecompat_diroption and their behavior. Seems that the combination of both options make the trick for me. I have started to think that it was something related to the metadata of the folder objects, I couldn't see them in the aws s3 console so I have made an script to inspect them using head_object to find out that the folder objects where not created at all by goofysAn error occurred (404) when calling the HeadObject operation: Not FoundEven when you could see them in the AWS console, they aren't physically there in the s3 bucket.
So one trick could be using
touch /s3/bucket/local/path/folderand that creates the folder object in the s3 bucket with the proper metadata, and I will look forward to do this also but for future reference using onlycompat_diroption will allow s3fs-fuse to see the hidden folders and walk on the files on themAfter the touch is execute on the hidden folder path I could see a success response from the head_object on the folder
Once again Thank you so much @ggtakec
@aek commented on GitHub (Jan 29, 2023):
I have used this command to recursively touch the directories and being able to create them using s3fs-fuse
find /s3/bucket/data_dir/filestore/ -type d -exec touch {} +@ggtakec commented on GitHub (Jan 30, 2023):
Thank you for confirmation.
If you have other problems, please open a new issue.
@michaelsmoody commented on GitHub (May 10, 2023):
I would suggest @ggtakec that the relevant information in this issue be put into the core documentation for "importing" previous objects, especially the
compat_dircoupled with a massfind.Thoughts? I could potentially write it up if agreeable.
@ggtakec commented on GitHub (May 13, 2023):
@michaelsmoody
Thanks for the suggestions.
You're right, maybe we should write it down as a workaround, and I think that the best place for that seems to be the wiki.
@gaul What do you think?
I think that it can be described as "reference" and "self-responsibility" in the appropriate place on the wiki.