[GH-ISSUE #2099] Massively add S3 Bucket Objects needed Metadata for s3fs-fuse #1067

Closed
opened 2026-03-04 01:51:06 +03:00 by kerem · 9 comments
Owner

Originally created by @aek on GitHub (Jan 26, 2023).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2099

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.91 (commit:14eb1a7) with OpenSSL

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse or dpkg -s fuse)

Version: 2.9.4-1ubuntu3.1

Kernel information (uname -r)

5.7.11-mainline-rev1

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

How to run s3fs, if applicable

using systemd mount unit like:

[Unit]
Description=S3 Storage Bucket
After=network.target

[Mount]
What=bucket-filestore
Where=/s3/bucket
Type=fuse.s3fs
Options=_netdev,allow_other,umask=000

[Install]
WantedBy=multi-user.target

Details about issue

I have been using goofys to mount the s3 bucket as a local folder and have already several GBs of files and folders for different projects that works better with s3fs because it's more stable than goofys so I switch back to s3fs.
Everything works fine for applications since they store internally what paths they need to use to retrieve the files locally. I'm using Odoo specifically to talk directly to s3fs local folder and have the filestore stored into the s3 bucket.
The problem appear when I try to create a backup for the filestore and I'm getting an empty filestore since the files originally where created and uploaded by goofys and they don't have the s3fs metadata properly created so a normal list of the filestore in the server s3fs folder shows everything as empty even when the files and folder are there in the s3 bucket and i could use it ok if I directly target the files like using nano.
I need a way to be able to massively set the s3fs metadata for the s3 bucket files to avoid the re-upload of all the files just to be seen by s3fs. I already have a lot of GBs stored in the s3 bucket that needs x-amz metadata headers properly set.
If there any path I could follow to accomplish this task? perhaps a python script using boto3 that I could build to sync the files metadata to the s3 bucket objects.
I will need to know what are the needed metadata and where to get their value.
That anyone have get something like this working? maybe there is an script already built for this
I'm already looking at the s3fs code that produce the metadata in order to try to replicate that same in python to sync the s3 bucket objects to be use it with s3fs.

Originally created by @aek on GitHub (Jan 26, 2023). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2099 #### Version of s3fs being used (`s3fs --version`) Amazon Simple Storage Service File System V1.91 (commit:14eb1a7) with OpenSSL #### Version of fuse being used (`pkg-config --modversion fuse`, `rpm -qi fuse` or `dpkg -s fuse`) Version: 2.9.4-1ubuntu3.1 #### Kernel information (`uname -r`) 5.7.11-mainline-rev1 #### GNU/Linux Distribution, if applicable (`cat /etc/os-release`) NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial #### How to run s3fs, if applicable using systemd mount unit like: ```systemd [Unit] Description=S3 Storage Bucket After=network.target [Mount] What=bucket-filestore Where=/s3/bucket Type=fuse.s3fs Options=_netdev,allow_other,umask=000 [Install] WantedBy=multi-user.target ``` ### Details about issue I have been using goofys to mount the s3 bucket as a local folder and have already several GBs of files and folders for different projects that works better with s3fs because it's more stable than goofys so I switch back to s3fs. Everything works fine for applications since they store internally what paths they need to use to retrieve the files locally. I'm using Odoo specifically to talk directly to s3fs local folder and have the filestore stored into the s3 bucket. The problem appear when I try to create a backup for the filestore and I'm getting an empty filestore since the files originally where created and uploaded by goofys and they don't have the s3fs metadata properly created so a normal list of the filestore in the server s3fs folder shows everything as empty even when the files and folder are there in the s3 bucket and i could use it ok if I directly target the files like using `nano`. I need a way to be able to massively set the s3fs metadata for the s3 bucket files to avoid the re-upload of all the files just to be seen by s3fs. I already have a lot of GBs stored in the s3 bucket that needs x-amz metadata headers properly set. If there any path I could follow to accomplish this task? perhaps a python script using boto3 that I could build to sync the files metadata to the s3 bucket objects. I will need to know what are the needed metadata and where to get their value. That anyone have get something like this working? maybe there is an script already built for this I'm already looking at the s3fs code that produce the metadata in order to try to replicate that same in python to sync the s3 bucket objects to be use it with s3fs.
kerem closed this issue 2026-03-04 01:51:06 +03:00
Author
Owner

@michaelsmoody commented on GitHub (Jan 27, 2023):

I suspect that the fix for your issue is contained in some commits that are waiting for a new version (1.92). I'm similarly waiting for the same.

@ggtakec and @gaul ? Any opportunity for a release of 1.92 with the fixes for https://github.com/s3fs-fuse/s3fs-fuse/issues/2033 and https://github.com/s3fs-fuse/s3fs-fuse/issues/2032 which was https://github.com/s3fs-fuse/s3fs-fuse/pull/1964 and https://github.com/s3fs-fuse/s3fs-fuse/pull/2039

I hate to ask again, but it's been ~3 months, and I've got a major blocking issue that I just can't use this until this is pushed out (and I'm 99% certain it would assist this individual's request. Failing that, I'll assist them if the other fixes can get committed please, that way, I'll have what they need)

<!-- gh-comment-id:1406543205 --> @michaelsmoody commented on GitHub (Jan 27, 2023): I suspect that the fix for your issue is contained in some commits that are waiting for a new version (1.92). I'm similarly waiting for the same. @ggtakec and @gaul ? Any opportunity for a release of 1.92 with the fixes for https://github.com/s3fs-fuse/s3fs-fuse/issues/2033 and https://github.com/s3fs-fuse/s3fs-fuse/issues/2032 which was https://github.com/s3fs-fuse/s3fs-fuse/pull/1964 and https://github.com/s3fs-fuse/s3fs-fuse/pull/2039 I hate to ask again, but it's been ~3 months, and I've got a major blocking issue that I just can't use this until this is pushed out (and I'm 99% certain it would assist this individual's request. Failing that, I'll assist them if the other fixes can get committed please, that way, I'll have what they need)
Author
Owner

@aek commented on GitHub (Jan 27, 2023):

@michaelsmoody thanks for pointing out the Issues and PRs.
I have review those and doesn't seems to be related with my issue here. Also I'm building s3fs from sources in order to try out the more latest changes merged into the master branch of the repository(seems that there is only that branch)
My issue is pretty much that all the files and folders that were not created through s3fs-fuse will not be seen in several commands because they don't have the the metadata headers from s3fs like:
image
If I directly upload the files to the aws console or use goofys(like mostly all my files), boto3 or any other way to upload the files that doesn't attach the metadata needed for s3fs will cause the folders and files to be a little bit hidden to s3fs. If you know the full file key paths you could use it without issues but something like a python os.walk or bash ls command won't be able to see the files and folders since they don't have the metadata attached. Also I have seen that running a bash tree command discover the files and folders ok and make the ls command to work but they don't survive a restart of s3fs

<!-- gh-comment-id:1406976940 --> @aek commented on GitHub (Jan 27, 2023): @michaelsmoody thanks for pointing out the Issues and PRs. I have review those and doesn't seems to be related with my issue here. Also I'm building s3fs from sources in order to try out the more latest changes merged into the master branch of the repository(seems that there is only that branch) My issue is pretty much that all the files and folders that were not created through s3fs-fuse will not be seen in several commands because they don't have the the metadata headers from s3fs like: <img width="1040" alt="image" src="https://user-images.githubusercontent.com/404966/215178294-a89ac682-40eb-4c40-a4b3-39952d808484.png"> If I directly upload the files to the aws console or use goofys(like mostly all my files), boto3 or any other way to upload the files that doesn't attach the metadata needed for s3fs will cause the folders and files to be a little bit hidden to s3fs. If you know the full file key paths you could use it without issues but something like a python `os.walk` or bash `ls` command won't be able to see the files and folders since they don't have the metadata attached. Also I have seen that running a bash `tree` command discover the files and folders ok and make the `ls` command to work but they don't survive a restart of s3fs
Author
Owner

@ggtakec commented on GitHub (Jan 29, 2023):

@aek
Would it be possible to solve this problem by assigning attributes to objects that do not have attributes handled by s3fs?

Try starting s3fs with the complement_stat and compat_dir options (also umask, mp_umask, etc.) using the code from the current master branch.
First of all, I think this will allow you to see objects(files/dirs) that are hidden.

Then, It is possible to give the attribute by updating the objects that do not have this attribute (probably Unixtime is 0 (1970...)).

For example, updating a file's timestamp with the touch command.
If you're using a server that supports multipart upload and Copy API, this operation can only update attributes(x-amz-meta-* headers) instead of uploading the entire file object.
(If the number of files is large, a certain amount of cost(time/requests) will be required to update all of them.)

Is it possible to try this method?

<!-- gh-comment-id:1407589961 --> @ggtakec commented on GitHub (Jan 29, 2023): @aek Would it be possible to solve this problem by assigning attributes to objects that do not have attributes handled by s3fs? Try starting s3fs with the `complement_stat` and `compat_dir` options (also `umask`, `mp_umask`, etc.) using the code from the current master branch. First of all, I think this will allow you to see objects(files/dirs) that are hidden. Then, It is possible to give the attribute by updating the objects that do not have this attribute (probably `Unixtime` is 0 (1970...)). For example, updating a file's timestamp with the `touch` command. If you're using a server that supports `multipart upload` and `Copy API`, this operation can only update attributes(`x-amz-meta-*` headers) instead of uploading the entire file object. (If the number of files is large, a certain amount of cost(time/requests) will be required to update all of them.) Is it possible to try this method?
Author
Owner

@ggtakec commented on GitHub (Jan 29, 2023):

@michaelsmoody
Sorry to keep you waiting.
We are still reviewing uninvestigated bug reports and fixing them, which is taking time.
However, we also maintain the desire to release early.
(I will create Issue #2102 and adjust it there.)

<!-- gh-comment-id:1407589985 --> @ggtakec commented on GitHub (Jan 29, 2023): @michaelsmoody Sorry to keep you waiting. We are still reviewing uninvestigated bug reports and fixing them, which is taking time. However, we also maintain the desire to release early. (I will create Issue #2102 and adjust it there.)
Author
Owner

@aek commented on GitHub (Jan 29, 2023):

@ggtakec thank you so much. That config option compat_dir indeed helps to solve the issue. Here is my systemd unit that allow me to see the hidden folders.

[Unit]
Description=S3 Storage Bucket
After=network.target

[Mount]
What=bucket-filestore
Where=/s3/bucket
Type=fuse.s3fs
Options=_netdev,allow_other,umask=000,complement_stat,compat_dir

[Install]
WantedBy=multi-user.target

I have seen and tried complement_stat before but wasn't aware of the compat_dir option and their behavior. Seems that the combination of both options make the trick for me. I have started to think that it was something related to the metadata of the folder objects, I couldn't see them in the aws s3 console so I have made an script to inspect them using head_object to find out that the folder objects where not created at all by goofys
An error occurred (404) when calling the HeadObject operation: Not Found
Even when you could see them in the AWS console, they aren't physically there in the s3 bucket.
So one trick could be using touch /s3/bucket/local/path/folder and that creates the folder object in the s3 bucket with the proper metadata, and I will look forward to do this also but for future reference using only compat_dir option will allow s3fs-fuse to see the hidden folders and walk on the files on them
After the touch is execute on the hidden folder path I could see a success response from the head_object on the folder
Once again Thank you so much @ggtakec

<!-- gh-comment-id:1407736289 --> @aek commented on GitHub (Jan 29, 2023): @ggtakec thank you so much. That config option `compat_dir` indeed helps to solve the issue. Here is my systemd unit that allow me to see the hidden folders. ```systemd [Unit] Description=S3 Storage Bucket After=network.target [Mount] What=bucket-filestore Where=/s3/bucket Type=fuse.s3fs Options=_netdev,allow_other,umask=000,complement_stat,compat_dir [Install] WantedBy=multi-user.target ``` I have seen and tried `complement_stat` before but wasn't aware of the `compat_dir` option and their behavior. Seems that the combination of both options make the trick for me. I have started to think that it was something related to the metadata of the folder objects, I couldn't see them in the aws s3 console so I have made an script to inspect them using head_object to find out that the folder objects where not created at all by goofys `An error occurred (404) when calling the HeadObject operation: Not Found` Even when you could see them in the AWS console, they aren't physically there in the s3 bucket. So one trick could be using `touch /s3/bucket/local/path/folder` and that creates the folder object in the s3 bucket with the proper metadata, and I will look forward to do this also but for future reference using only `compat_dir` option will allow s3fs-fuse to see the hidden folders and walk on the files on them After the touch is execute on the hidden folder path I could see a success response from the head_object on the folder Once again Thank you so much @ggtakec
Author
Owner

@aek commented on GitHub (Jan 29, 2023):

I have used this command to recursively touch the directories and being able to create them using s3fs-fuse
find /s3/bucket/data_dir/filestore/ -type d -exec touch {} +

<!-- gh-comment-id:1407739004 --> @aek commented on GitHub (Jan 29, 2023): I have used this command to recursively touch the directories and being able to create them using s3fs-fuse `find /s3/bucket/data_dir/filestore/ -type d -exec touch {} +`
Author
Owner

@ggtakec commented on GitHub (Jan 30, 2023):

Thank you for confirmation.
If you have other problems, please open a new issue.

<!-- gh-comment-id:1408482850 --> @ggtakec commented on GitHub (Jan 30, 2023): Thank you for confirmation. If you have other problems, please open a new issue.
Author
Owner

@michaelsmoody commented on GitHub (May 10, 2023):

I have used this command to recursively touch the directories and being able to create them using s3fs-fuse find /s3/bucket/data_dir/filestore/ -type d -exec touch {} +

I would suggest @ggtakec that the relevant information in this issue be put into the core documentation for "importing" previous objects, especially the compat_dir coupled with a mass find.

Thoughts? I could potentially write it up if agreeable.

<!-- gh-comment-id:1542317362 --> @michaelsmoody commented on GitHub (May 10, 2023): > I have used this command to recursively touch the directories and being able to create them using s3fs-fuse `find /s3/bucket/data_dir/filestore/ -type d -exec touch {} +` I would suggest @ggtakec that the relevant information in this issue be put into the core documentation for "importing" previous objects, especially the `compat_dir` coupled with a mass `find`. Thoughts? I could potentially write it up if agreeable.
Author
Owner

@ggtakec commented on GitHub (May 13, 2023):

@michaelsmoody
Thanks for the suggestions.
You're right, maybe we should write it down as a workaround, and I think that the best place for that seems to be the wiki.

@gaul What do you think?
I think that it can be described as "reference" and "self-responsibility" in the appropriate place on the wiki.

<!-- gh-comment-id:1546725537 --> @ggtakec commented on GitHub (May 13, 2023): @michaelsmoody Thanks for the suggestions. You're right, maybe we should write it down as a workaround, and I think that the best place for that seems to be the wiki. @gaul What do you think? I think that it can be described as "reference" and "self-responsibility" in the appropriate place on the wiki.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1067
No description provided.