mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #15] Cache files are blank #10
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#10
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Matan on GitHub (Feb 21, 2014).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/15
Please look at this binary comparison image:

On the left we have the source image file downloaded from S3 console and on the right we have the cache file created by the s3fs mount.
Is this the expected behaviour? I noticed that the cache doesn't make a speed difference as the file is being downloaded from S3 each time. I tested latency to my web server in a couple of ways:
This lead me to believe that the cache isn't working properly, thus I did the binary comparison and found the bytes of the source vs cache doesn't match up.
Urgent help would be much appreciated.
Thanks,
Matan
@Matan commented on GitHub (Feb 21, 2014):
PS: The version above used is v1.76. I've also tested v1.74 (From Google Code) and it doesn't have this cache behaviour. The cache files created by v1.74 are identical to the source file. However, I'm still experience the same latency artifact as mentioned above.
@ggtakec commented on GitHub (Feb 23, 2014):
Hi,
Do you compare binaries, original and local cache file?
s3fs cached object(file)s in cache directory which is specified by “use_cache” option.
This cache file is separated each 20MB and downloads each area by multi requests.
It uses multi requests by Range header.
So that, if you downloads file by it’s all size(ex: 100MB), s3fs sends multi requests each 20MB parts with range header.
After downloading all of file size, the cache file will be as same as original.
But if you downloads a part of the file, the cache file dose not have completely original.
Then the cache file is for performance, it works good for downloading after second time, or for a part of file.
First time for downloading, s3fs will not get good performance by this cache system.
But over 20-50MB file, s3fs try to use multi request for downloading.
Then if you have good network performance, maybe this multi request become to be good performance.
If I mis-understand about your problem, please let me know.
Thanks in advance for your help.
@Matan commented on GitHub (Feb 24, 2014):
Hi Takeshi,
The file was a 700kb file so no multipart download was going on. I compared the original file with cache file and found that with v1.76 the cache file was "blank". I was the correct size as it had the same amount of bytes, but the bytes were all "00".
I reverted back to v1.74 and the problem went away. I am still getting very high latency when reading a s3fs mounted file as it seems like the cache files are always re-downloaded. This is my main issue as it slows down everything quite a lot.
Thanks!
@ggtakec commented on GitHub (Feb 25, 2014):
Hi, Matan
I want to confirm whether s3fs got something error by GET request.
If s3fs got something error(ex: timeout, error response), probably the cache file be zero( only truncated).
If you can, I want to know the real size of the cache file that seems XXXX bytes by ls command.
You can use "du" command for knowing real size.
s3fs's cache file is made by truncate for initializing, so it will be showed zero byte when s3fs did not update it because of something error.
And if you can run s3fs manually with "-d" and "-f" option, s3fs puts error log in /var/log/messages(sometimes another file depending on system).
Maybe that log messages help us for solving this problem.
Thanks in advance for your help.
@darrencruse commented on GitHub (Jun 17, 2014):
Hi guys we believe we're having this same problem where the s3fs cache does not seem to improve our response times, even after the file has been loaded into the cache the first time.
When looking at the output using the -d -f options suggested above, it seems to confirm that s3fs is going back to s3 for files even when the file is in the cache.
We've tried several different versions of s3fs including 1.77, 1.74, and master but it's always the same.
We saw one example like the OP said of a cache file having all zeros in it (even though the file size looked correct). But we don't think that's typical mostly our cache files look ok the size looks correct and we can usually look at their content (e.g. html files) and they look good.
The only funny thing looking at the files is the date/time of the files look strange e.g. 2011 dates on files that in S3 show a recent 2014 date. Could that be a clue?
So is this felt to be a real problem now btw? (we tried master partly cause we saw changes in the md5 hash code stuff and wondered if changes were made relating to this problem?)
But otherwise can anybody clarify how the cache handling is intended to work behind the scenes?
i.e. we're trying to add some extra logging into the s3fs code to help debug but we're very new to the code...
Does it do a head on S3 on each request for a file, to get the etag from S3 to compare to that of the local file? And then if that matches the etag of what's in the cache it shouldn't go back to S3 right?
@Matan commented on GitHub (Jun 17, 2014):
I ended up using riofs. I suggest you check it out as well. If you're pushing files to CloudFront, be sure to use mime magic to set the correct content type for js and css files. Otherwise browsers won't understand what they are.
@darrencruse commented on GitHub (Jun 18, 2014):
Thanks for the suggestion Matan - I wasn't even aware of riofs.
From some brief testing I was getting excited about riofs, until I tried something with a sym link and got an error?
So it looks like riofs doesn't support symbolic links(?)
Sym links are critical for my particular use case (that was one of the reasons we'd selected s3fs to begin with).
@Matan commented on GitHub (Jun 20, 2014):
Oooh, that's a shame. I combine riofs with bindfs. Maybe it would worth while checking if bindfs supports sym links.
So to my mind, an optimal solution is to have a single bucket mounted. Then use bindfs to 're-locate' certain folders to where you need them. Bindfs also allows you to change ownership, etc. Obviously there is a performance knock to consider, but functionality always does. :)
@ggtakec commented on GitHub (Jan 17, 2016):
I'm sorry for that this issue had been left at a long period of time.
If you have this problem yet, please try to use latest code.
Thanks in advance for your help.
@Morc001 commented on GitHub (Mar 29, 2016):
Hi Takeshi,
This problem of files containing only zeros (empty sparse files?) is still happening on master as of today. Files are small, a few hundred bytes only, so no multi-part. I'm just mounting a bucket and viewing files, nothing fancy. I'm using eu-central-1 endpoint if it makes any difference. In your last comment you wrote it's fixed or has that referred to some other problem?
Using 1.79 the problem does not show, but that version lacks ensure_diskfree.
@davrodfer commented on GitHub (Apr 4, 2016):
I had the same issue. I wrote my files from two different machines with the same role. When I read a file from a machine that didn't wrote the file, all I have is a bunch of zeroes. Every machine only can read the files that they wrote, but the files are Ok in S3.
Y had this configuration in /etc/fstab:
I changed to:
And now it works, I can read the files from all machines. I have disabled the cache. I don't know if this can be a problem in the future.
@aggronerd commented on GitHub (Apr 4, 2016):
I am getting the same. Files appearing to have zero bits. This has only just happened recently (last couple of weeks) so I have got around the issue by rolling back to SHA
cf56b35766. I haven't really been able to debug it further and can't afford to lose caching.The volume in my case is mounted from the command line:
s3fs $BUCKET_NAME $MOUNTPOINT -o iam_role=$IAM_ROLE,nosuid,nonempty,nodev,use_cache=/tmp,allow_other,retries=5,url=https://s3-eu-west-1.amazonaws.com/@gl-lamhnguyen commented on GitHub (Apr 4, 2016):
@davrodfer I have same issue, so you're saying disable caching is the solution?
EDIT: @aggronerd yes using that SHA works for me also.
@ggtakec commented on GitHub (Apr 11, 2016):
@Morc001 @davrodfer @gl-lamhnguyen
I'm sorry for replying late.
As mentioned by aggronerd, I've made the bug.
I try to fix this, please wait a while.
Regards,
@CloudaYolla commented on GitHub (Apr 12, 2016):
Hi @ggtakec,
We are using s3fs in a project, and suddenly everything started not working. After a week of investigation (searching the problem elsewhere), today we found out that the existing files mounted from s3fs are shown corrupt. But when a new file is created, it is written and read ok in the s3 mount folder.
Is there any workaround for this? I didn't understand the thread below if there is any?
We cannot afford to wait for a fix to come.
Thanks and best regards,
EP Team
@gl-lamhnguyen commented on GitHub (Apr 12, 2016):
@CloudaYolla I guess your broken s3fs was pulled from latest commit. I couldn't wait either, so I used this SHA instead
github.com/s3fs-fuse/s3fs-fuse@cf56b35766Download ZIP and run install commands, things will be back to normal. Lesson learned, we should keep a copy of a working version on our end at all time.
@ggtakec commented on GitHub (Apr 12, 2016):
@CloudaYolla @gl-lamhnguyen
I reverted #379 on master, so please use
d048f38.And I wil fix a bug(#379) later.
Thanks in advance for your help.
@ggtakec commented on GitHub (Apr 12, 2016):
#395 merged on master branch, please try to use latest codes.
Thanks in advance for your help.
@CloudaYolla commented on GitHub (Apr 12, 2016):
Hi @ggtakec,
So this means, we don't need to revert to the mentioned previous version?
Thank you,
EP Team
@ggtakec commented on GitHub (Apr 15, 2016):
@CloudaYolla
Since This bug had been affecting most users, I needed to revert the code until fixing it.
Then this bug was fixed by #395 which patched for #379.
I am sorry for my fault.
I closed this issue, if you found a same bug, please reopen this.
Regards,