mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 21:35:58 +03:00
[GH-ISSUE #2107] s3fs adds characters to file #1073
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#1073
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mabi08 on GitHub (Feb 2, 2023).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2107
Additional Information
Version of s3fs being used (
s3fs --version)Amazon Simple Storage Service File System V1.91 (commit:unknown) with OpenSSL
Version of fuse being used (
pkg-config --modversion fuse,rpm -qi fuseordpkg -s fuse)Name : fuse
Version : 2.9.2
Release : 11.amzn2
Architecture: x86_64
Install Date: Thu 26 Jan 2023 10:49:48 PM UTC
Group : System Environment/Base
Size : 222809
License : GPL+
Signature : RSA/SHA256, Thu 06 Dec 2018 07:31:53 PM UTC, Key ID 11cf1f95c87f5b1a
Source RPM : fuse-2.9.2-11.amzn2.src.rpm
Build Date : Fri 16 Nov 2018 08:35:39 PM UTC
Build Host : build.amazon.com
Relocations : (not relocatable)
Packager : Amazon Linux
Vendor : Amazon Linux
URL : https://github.com/libfuse/libfuse
Summary : File System in Userspace (FUSE) utilities
Kernel information (
uname -r)4.14.301-224.520.amzn2.x86_64
GNU/Linux Distribution, if applicable (
cat /etc/os-release)NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3⭕amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
How to run s3fs, if applicable
[] command line
[x] /etc/fstab
Details about issue
Hi,
we are using s3fs to mount s3 buckets on AWS appliances (Amazon Linux 2).
However, we are currently facing a bug where removing characters from a file seems to add weird characters.
In Jupyterlab editor those are shown as red dots in the file, afterwards the file is not usable anymore. This is reproducible also with other type of files. We tried building s3fs from source but this does not fix the issue for us. I attached a gif showing the problem.
Kind regards
Maik
@ggtakec commented on GitHub (Feb 8, 2023):
@mabi08
Please tell me the line in fstab when starting s3fs.
And if you can use the s3fs options(dbglevel=info, curldbg) and get the log file, please let us show about it.
This phenomenon looks like when s3fs uploads(updates) a file, it doesn't change the size of the original file(or is not instructed to do so).
However, I am not familiar with the Jupyterlab editor, so parsing may be difficult.
@iptizer commented on GitHub (Feb 20, 2023):
Having the same problem at the moment and added the debug logs as requested. I will try to add some details from our troubleshooting and see whether this maybe alsready helps to debug or give us a hint. The logs are a bit difficult to anonymize, so I am reducing it to what I think could be relevant.
What we did:
After saving, opening the file with vim shows that 85 null characters are added:
Below a part of the logs after saving (there is much more, but this seems to be interesting):
20230220_s3fs_save.log
The interesting thing is, that in the end there is a "partial upload" of 85 characters.
Maybe you have an idea where this comes from?
Thanks in advance for the help here!
@ggtakec commented on GitHub (Mar 12, 2023):
This phenomenon is very similar to the bug that occurred with s3fs in the past.
Internally, it seems to be caused by not being able to change the file size, we could not know that trigger from your log.
By the way, I would like to know the result when the same operation is performed with
vim -n(without creating a swap file).I would like to know if this is a problem that only occurs in
JupyterNotebookThe s3fs master branch has recently had a fix that may be related to this. (Maybe #2122 is involved)
If you can, try the code on the master branch.
@iptizer commented on GitHub (Mar 12, 2023):
Haven't tried with
vim -n, butvimitself does not produce this result. I will try the-nflag and report back.But what I can say is, that we observe the same result with
RStudioanother web based IDE.What are your thoughts on the difference between vim and other editors? Is there anything we could trace? Would it help to provide the exact parameters we use? Would it help to provide larger volume of logs? Maybe what is not obvious to me might be obvious to you :).
We will also see whether a new build fixes the problem.
@ggtakec commented on GitHub (Mar 12, 2023):
First of all, the purpose of using vim's
-noption is to operate the target file directly(for the file descriptor) without using a temporary file.If the flow is to open, edit, and close files with s3fs, I don't think there will be any problems.
Regarding this issue, I would like to know if it is possible that
JupyterNotebookis flushing the file with it open.I would expect s3fs to work even if I had operated it that way.
However, there may be some hidden problem.
I'm sorry to trouble you, but it would be helpful if you could try it with the new code.
@ggtakec commented on GitHub (Mar 12, 2023):
I may have been able to reproduce a similar phenomenon.
It seems that it could be reproduced when closing the target file without flushing it after truncating the file size.(
JupyterNotebookmay be doing the same)I'll do some more research and check.
@ggtakec commented on GitHub (Mar 16, 2023):
I was able to identify the cause of the bug and posted a PR #2131 that fixed it.
It was a bug that if user shrinks the file while it is open and read the file before flushing, the file size becomes the original size.
If you can, please try the code of PR and let us know if you found a problem.
@ggtakec commented on GitHub (Mar 18, 2023):
@iptizer
#2131 was not completely fixed, so #2133 was newly created.
(#2131 is closed, #2133 will move forward to resolve this issue)
If you are still able to test, please try #2133.
Thanks in advance for your assistance.
@ggtakec commented on GitHub (Mar 26, 2023):
@mabi08
I merged #2133, but I understand that you are still reporting issues after merging #2133, so we will continue to investigate this issue.
Currently, I have installed
Jupyter NotebookonUbuntu 22.04and modfied files on browser, but I have not been able to reproduce the same phenomenon.It's not reproducible in both
v1.91and latest code(merged #2133).Again, could you tell us the exact options when starting s3fs?
I've tried with multipart-upload, stream-upload, with/without cache-directory, but I can't reproduce it, so we'll need to try with the exact same your options.
@mabi08 commented on GitHub (Mar 30, 2023):
@ggtakec
We are running s3fs on Amazon Linux 2 and mounting multiple S3 buckets on those appliances using fstab.
Here is an extract of /etc/fstab:

What I did for testing was uninstall the old s3fs version via yum + install the latest commit and restart the machine. Could there be some caching mechanism involved?
We really appreciate your effort!
@ggtakec commented on GitHub (May 30, 2023):
@mabi08 We have made some progress that may be relevant to this issue and will be contacting you.
PR #2152 (by @eryugey) has been merged.
This code fix includes a fix for correct exclusive control when file operations are performed from multiple processes.
It may have addressed the remaining bugs in this issue, I hope so.
We didn't make it into the new version v1.92 in time, but if you can try the code in this master branch, please do so.
@senfbrot commented on GitHub (May 30, 2023):
Hi @ggtakec,
I've coincidently stumbled over this update. After trying the latest version from source, it seems as if it does still not work.
Additional Information
Version of s3fs being used (
s3fs --version)Version of fuse being used (
pkg-config --modversion fuse,rpm -qi fuseordpkg -s fuse)Kernel information (
uname -r)GNU/Linux Distribution, if applicable (
cat /etc/os-release)How to run s3fs, if applicable
[] command line
[x] /etc/fstab
s3fs syslog messages (
grep s3fs /var/log/syslog,journalctl | grep s3fs, ors3fs outputs)@ggtakec commented on GitHub (May 30, 2023):
@senfbrot Thank you for your prompt reply.
It seems that we need to investigate further to see if there are other causes.
However, I haven't been able to reproduce the same phenomenon yet, so it will take some time to investigate.
@senfbrot commented on GitHub (May 30, 2023):
Please find attached the curldbg output for the following actions in JupyterLab:
s3fs.log