mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 13:26:00 +03:00
[GH-ISSUE #97] s3fs cache active even with use_cache="" #60
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#60
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Ideasrefined on GitHub (Dec 13, 2014).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/97
I have a bucket mounted using the following options
sudo s3fs tokbkt /mnt/bucket/ -oahbe_conf=/etc/sample_ahbe.conf
in test 1
sample_ahbe.conf was modified to add "cache-control : max-age=0" to .xfu files .
in test 2
sample_ahbe.conf was modified to add "cache-control : no-cache" to .xfu files .max-age=0"
both reading and writing to abc.xfu file on s3 bucket is done through S3fs .
The problem is when we read the file from S3 bucket via s3fs mount point using "cat" the file shows older contents. Whereas if we download by directly clicking on S3 bucket file it shows new contents.
Which leads us to believe that S3fs writing is working fine but on reading , it is serving us old files. If we unmount and remount the bucket and try reading again. The file served contains new contents.
As seen in the command line mentioned on top we are using default option of no caching.
Is there anything we are missing ?
@Ideasrefined commented on GitHub (Dec 15, 2014):
Problem came out to be simultaneous read write to the same file using S3FS on one single instance.
We solved it by opening it for read on one system and for write on another.
There is no problem opening it for read multiple times on the same instance.
Is this simultaneous read/write S3 limitation or s3fs limitation ?
@ggtakec commented on GitHub (Jan 6, 2015):
Hi,
At first, s3fs does not check cache-control header which is specified in ahbe_conf.
And s3fs always use temporary file for reading/writing whichever specified use_cache option.
(Do you specify this option?)
When you do not specify use_cache option, s3fs use temporary file one time.
But at same time reading and writing is occurred, probably the same file contents is returned.
If you specify use_cache, temporary file is used until file stats is updated.
So I think latest s3fs exclusive control is not complete when reading and writing to the same file has occurred at the same time.
If you tested s3fs with use_cache option, please try to not specify it.(I want to know the result.)
Thanks in advance for your assistance.
@Ideasrefined commented on GitHub (Jan 7, 2015):
Hi,
We are not using use_cache option in the mount command line and result is the same.
s3fs man page says use_cache is disabled by default. So we are assuming if we haven't specified it the cache should be disabled.
Eitherway, whatever combination of cache or no cache we have tried the result remains the same.
@ggtakec commented on GitHub (Jan 13, 2015):
Hi,
when doe not specify use_cache option, s3fs makes temporary file for reading/writing on local system.
The temp file is created at opening the file, and removed at closing.
So that, the cache(temp file) is kept only during opening file.
But when one process opens the file and creates(keeps) temp file and other process tries to open same file, s3fs uses same temp file for other processs.
Probably, I think this case is your problem.
Is your file(object) large size?(or do your processes open the file at same time?)
Regards,
@aa-jaunt commented on GitHub (Jan 25, 2015):
This is really bad, unusable for anybody working with large S3 files. Looking at the code, it appears that s3fs will always load the entire file to local disk first regardless of whether use_cache is specified. The only difference is when use_cache is not specified local file will be deleted when file is closed. Any plans to get away from that? If not, I will change it myself.
@aa-jaunt commented on GitHub (Jan 26, 2015):
It seems to me that code was not intended to work that way and problem is in PageList::GetUninitPages(fdpage_list_t& uninit_list, off_t start) in fdcache.cpp
Basically, instead of doing some read ahead caching, which is smart and fine, it merges all unread pages into one. So regardless of my read size it results in FdEntity::Load caching entire file. That's an immediately visible problem. I will try to fix it tonight or tomorrow.
@aa-jaunt commented on GitHub (Jan 26, 2015):
So removing bunch couple of strange lines from GetUninitPages will solve problem of loading entire file even if we want to read one byte:
//fdpage_list_t::reverse_iterator riter = uninit_list.rbegin();
//if(riter != uninit_list.rend() && (_riter)->next() == (_iter)->offset){
// merge to before page
// (_riter)->bytes += (_iter)->bytes;
//}else{
fdpage* page = new fdpage((_iter)->offset, (_iter)->bytes, false);
uninit_list.push_back(page);
//}
That should save us some money. But we still have a problem of caching locally a huge file if we read through entire file. That's lesser of a problem I think, although we could institute reasonable simple garbage collection and clean pages that are before current offset.
@boazrf commented on GitHub (Feb 1, 2015):
Encounter the same issue and implement the same fix. Thanks Anatoly.
Alternative fix would be to pass the requested size to GetUninitPages and stop merging pages when size reached.
@ggtakec commented on GitHub (Mar 4, 2015):
Hi all, I'm sorry for replying late.
In this issue, I has confirmed that there are following two types of bugs for s3fs.
@Ideasrefined
First bug in list is your problem.
@aa-jaunt
I was able to in your pointed out to find the failure of the second code.
And I saw your patch codes, but it seems some problem about s3fs could not use multipart downloads.
I upload fixed new codes as soon as possible, please wait a moment.
Thanks in advance foryour help.
@ggtakec commented on GitHub (Mar 4, 2015):
I fixed two bugs of this issue, #138
And I closed this issue.
Please try to use fixed codes in master branch, and if you find another bugs please reopen this issue or post new issue.
Regards,
@crsepulv commented on GitHub (May 12, 2016):
I do not use any cache options, I just used
s3fs bucket /mnt/s3I have many video files on S3, I'm using ffmpeg to rencode it from flv to mp4 and rusn flawless.
I deleted the new file (video.mp4) from amazon web manager (not from command line) to try another encoding params and run ffmpeg again, but it says : "the file already exists", but it is deleted on s3.
I thinks there is a cache couting this deleted file, there is a way to clear this cache?
@ggtakec commented on GitHub (May 14, 2016):
@crsepulv
If you can, please try to set max_stat_cache_size=0 option.
Maybe stat cache is enable on your case.
And if you do not solve this problem, please post new issue.
Thanks in advance for your assistance.
@j08lue commented on GitHub (Jul 2, 2019):
This is a very old, closed thread, but just for the record:
I have the same issue with cache growing too large and no apparent way to disable it. Tried
use_cache=""andensure_diskfree=50to no avail.Why would
max_stat_cache_sizemean anything here @ggtakec? As far as I can see, that cache is tiny (metadata only) and in memory and not on disk.