[GH-ISSUE #97] s3fs cache active even with use_cache="" #60

Closed
opened 2026-03-04 01:41:39 +03:00 by kerem · 13 comments
Owner

Originally created by @Ideasrefined on GitHub (Dec 13, 2014).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/97

I have a bucket mounted using the following options
sudo s3fs tokbkt /mnt/bucket/ -oahbe_conf=/etc/sample_ahbe.conf

in test 1
sample_ahbe.conf was modified to add "cache-control : max-age=0" to .xfu files .

in test 2
sample_ahbe.conf was modified to add "cache-control : no-cache" to .xfu files .max-age=0"

both reading and writing to abc.xfu file on s3 bucket is done through S3fs .
The problem is when we read the file from S3 bucket via s3fs mount point using "cat" the file shows older contents. Whereas if we download by directly clicking on S3 bucket file it shows new contents.

Which leads us to believe that S3fs writing is working fine but on reading , it is serving us old files. If we unmount and remount the bucket and try reading again. The file served contains new contents.

As seen in the command line mentioned on top we are using default option of no caching.

Is there anything we are missing ?

Originally created by @Ideasrefined on GitHub (Dec 13, 2014). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/97 I have a bucket mounted using the following options sudo s3fs tokbkt /mnt/bucket/ -oahbe_conf=/etc/sample_ahbe.conf in test 1 sample_ahbe.conf was modified to add "cache-control : max-age=0" to .xfu files . in test 2 sample_ahbe.conf was modified to add "cache-control : no-cache" to .xfu files .max-age=0" both reading and writing to abc.xfu file on s3 bucket is done through S3fs . The problem is when we read the file from S3 bucket via s3fs mount point using "cat" the file shows older contents. Whereas if we download by directly clicking on S3 bucket file it shows new contents. Which leads us to believe that S3fs writing is working fine but on reading , it is serving us old files. If we unmount and remount the bucket and try reading again. The file served contains new contents. As seen in the command line mentioned on top we are using default option of no caching. Is there anything we are missing ?
kerem 2026-03-04 01:41:39 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@Ideasrefined commented on GitHub (Dec 15, 2014):

Problem came out to be simultaneous read write to the same file using S3FS on one single instance.
We solved it by opening it for read on one system and for write on another.

There is no problem opening it for read multiple times on the same instance.

Is this simultaneous read/write S3 limitation or s3fs limitation ?

<!-- gh-comment-id:66947556 --> @Ideasrefined commented on GitHub (Dec 15, 2014): Problem came out to be simultaneous read write to the same file using S3FS on one single instance. We solved it by opening it for read on one system and for write on another. There is no problem opening it for read multiple times on the same instance. Is this simultaneous read/write S3 limitation or s3fs limitation ?
Author
Owner

@ggtakec commented on GitHub (Jan 6, 2015):

Hi,

At first, s3fs does not check cache-control header which is specified in ahbe_conf.
And s3fs always use temporary file for reading/writing whichever specified use_cache option.
(Do you specify this option?)

When you do not specify use_cache option, s3fs use temporary file one time.
But at same time reading and writing is occurred, probably the same file contents is returned.
If you specify use_cache, temporary file is used until file stats is updated.

So I think latest s3fs exclusive control is not complete when reading and writing to the same file has occurred at the same time.
If you tested s3fs with use_cache option, please try to not specify it.(I want to know the result.)

Thanks in advance for your assistance.

<!-- gh-comment-id:68891707 --> @ggtakec commented on GitHub (Jan 6, 2015): Hi, At first, s3fs does not check cache-control header which is specified in ahbe_conf. And s3fs always use temporary file for reading/writing whichever specified use_cache option. (Do you specify this option?) When you do not specify use_cache option, s3fs use temporary file one time. But at same time reading and writing is occurred, probably the same file contents is returned. If you specify use_cache, temporary file is used until file stats is updated. So I think latest s3fs exclusive control is not complete when reading and writing to the same file has occurred at the same time. If you tested s3fs with use_cache option, please try to not specify it.(I want to know the result.) Thanks in advance for your assistance.
Author
Owner

@Ideasrefined commented on GitHub (Jan 7, 2015):

Hi,
We are not using use_cache option in the mount command line and result is the same.

s3fs man page says use_cache is disabled by default. So we are assuming if we haven't specified it the cache should be disabled.

Eitherway, whatever combination of cache or no cache we have tried the result remains the same.

How often is stat cache updated ?

<!-- gh-comment-id:68977791 --> @Ideasrefined commented on GitHub (Jan 7, 2015): Hi, We are not using use_cache option in the mount command line and result is the same. s3fs man page says use_cache is disabled by default. So we are assuming if we haven't specified it the cache should be disabled. Eitherway, whatever combination of cache or no cache we have tried the result remains the same. > > > How often is stat cache updated ?
Author
Owner

@ggtakec commented on GitHub (Jan 13, 2015):

Hi,
when doe not specify use_cache option, s3fs makes temporary file for reading/writing on local system.
The temp file is created at opening the file, and removed at closing.
So that, the cache(temp file) is kept only during opening file.

But when one process opens the file and creates(keeps) temp file and other process tries to open same file, s3fs uses same temp file for other processs.
Probably, I think this case is your problem.
Is your file(object) large size?(or do your processes open the file at same time?)

Regards,

<!-- gh-comment-id:69748947 --> @ggtakec commented on GitHub (Jan 13, 2015): Hi, when doe not specify use_cache option, s3fs makes temporary file for reading/writing on local system. The temp file is created at opening the file, and removed at closing. So that, the cache(temp file) is kept only during opening file. But when one process opens the file and creates(keeps) temp file and other process tries to open same file, s3fs uses same temp file for other processs. Probably, I think this case is your problem. Is your file(object) large size?(or do your processes open the file at same time?) Regards,
Author
Owner

@aa-jaunt commented on GitHub (Jan 25, 2015):

This is really bad, unusable for anybody working with large S3 files. Looking at the code, it appears that s3fs will always load the entire file to local disk first regardless of whether use_cache is specified. The only difference is when use_cache is not specified local file will be deleted when file is closed. Any plans to get away from that? If not, I will change it myself.

<!-- gh-comment-id:71390816 --> @aa-jaunt commented on GitHub (Jan 25, 2015): This is really bad, unusable for anybody working with large S3 files. Looking at the code, it appears that s3fs will always load the entire file to local disk first regardless of whether use_cache is specified. The only difference is when use_cache is not specified local file will be deleted when file is closed. Any plans to get away from that? If not, I will change it myself.
Author
Owner

@aa-jaunt commented on GitHub (Jan 26, 2015):

It seems to me that code was not intended to work that way and problem is in PageList::GetUninitPages(fdpage_list_t& uninit_list, off_t start) in fdcache.cpp

Basically, instead of doing some read ahead caching, which is smart and fine, it merges all unread pages into one. So regardless of my read size it results in FdEntity::Load caching entire file. That's an immediately visible problem. I will try to fix it tonight or tomorrow.

<!-- gh-comment-id:71416688 --> @aa-jaunt commented on GitHub (Jan 26, 2015): It seems to me that code was not intended to work that way and problem is in PageList::GetUninitPages(fdpage_list_t& uninit_list, off_t start) in fdcache.cpp Basically, instead of doing some read ahead caching, which is smart and fine, it merges all unread pages into one. So regardless of my read size it results in FdEntity::Load caching entire file. That's an immediately visible problem. I will try to fix it tonight or tomorrow.
Author
Owner

@aa-jaunt commented on GitHub (Jan 26, 2015):

So removing bunch couple of strange lines from GetUninitPages will solve problem of loading entire file even if we want to read one byte:
//fdpage_list_t::reverse_iterator riter = uninit_list.rbegin();
//if(riter != uninit_list.rend() && (_riter)->next() == (_iter)->offset){
// merge to before page
// (_riter)->bytes += (_iter)->bytes;
//}else{
fdpage* page = new fdpage((_iter)->offset, (_iter)->bytes, false);
uninit_list.push_back(page);
//}
That should save us some money. But we still have a problem of caching locally a huge file if we read through entire file. That's lesser of a problem I think, although we could institute reasonable simple garbage collection and clean pages that are before current offset.

<!-- gh-comment-id:71418244 --> @aa-jaunt commented on GitHub (Jan 26, 2015): So removing bunch couple of strange lines from GetUninitPages will solve problem of loading entire file even if we want to read one byte: //fdpage_list_t::reverse_iterator riter = uninit_list.rbegin(); //if(riter != uninit_list.rend() && (_riter)->next() == (_iter)->offset){ // merge to before page // (_riter)->bytes += (_iter)->bytes; //}else{ fdpage\* page = new fdpage((_iter)->offset, (_iter)->bytes, false); uninit_list.push_back(page); //} That should save us some money. But we still have a problem of caching locally a huge file if we read through entire file. That's lesser of a problem I think, although we could institute reasonable simple garbage collection and clean pages that are before current offset.
Author
Owner

@boazrf commented on GitHub (Feb 1, 2015):

Encounter the same issue and implement the same fix. Thanks Anatoly.
Alternative fix would be to pass the requested size to GetUninitPages and stop merging pages when size reached.

<!-- gh-comment-id:72383030 --> @boazrf commented on GitHub (Feb 1, 2015): Encounter the same issue and implement the same fix. Thanks Anatoly. Alternative fix would be to pass the requested size to GetUninitPages and stop merging pages when size reached.
Author
Owner

@ggtakec commented on GitHub (Mar 4, 2015):

Hi all, I'm sorry for replying late.

In this issue, I has confirmed that there are following two types of bugs for s3fs.

  • Though use_cache option is off, s3fs try to use local file cache which is already closed.
  • Regardless of the specified size, s3fs try to load the contents from specified start position till the end of file or the start position of next loaded range.

@Ideasrefined
First bug in list is your problem.

@aa-jaunt
I was able to in your pointed out to find the failure of the second code.
And I saw your patch codes, but it seems some problem about s3fs could not use multipart downloads.

I upload fixed new codes as soon as possible, please wait a moment.
Thanks in advance foryour help.

<!-- gh-comment-id:77116811 --> @ggtakec commented on GitHub (Mar 4, 2015): Hi all, I'm sorry for replying late. In this issue, I has confirmed that there are following two types of bugs for s3fs. - Though use_cache option is off, s3fs try to use local file cache which is already closed. - Regardless of the specified size, s3fs try to load the contents from specified start position till the end of file or the start position of next loaded range. @Ideasrefined First bug in list is your problem. @aa-jaunt I was able to in your pointed out to find the failure of the second code. And I saw your patch codes, but it seems some problem about s3fs could not use multipart downloads. I upload fixed new codes as soon as possible, please wait a moment. Thanks in advance foryour help.
Author
Owner

@ggtakec commented on GitHub (Mar 4, 2015):

I fixed two bugs of this issue, #138
And I closed this issue.

Please try to use fixed codes in master branch, and if you find another bugs please reopen this issue or post new issue.

Regards,

<!-- gh-comment-id:77118682 --> @ggtakec commented on GitHub (Mar 4, 2015): I fixed two bugs of this issue, #138 And I closed this issue. Please try to use fixed codes in master branch, and if you find another bugs please reopen this issue or post new issue. Regards,
Author
Owner

@crsepulv commented on GitHub (May 12, 2016):

I do not use any cache options, I just used s3fs bucket /mnt/s3

I have many video files on S3, I'm using ffmpeg to rencode it from flv to mp4 and rusn flawless.

I deleted the new file (video.mp4) from amazon web manager (not from command line) to try another encoding params and run ffmpeg again, but it says : "the file already exists", but it is deleted on s3.

I thinks there is a cache couting this deleted file, there is a way to clear this cache?

<!-- gh-comment-id:218740385 --> @crsepulv commented on GitHub (May 12, 2016): I do not use any cache options, I just used `s3fs bucket /mnt/s3` I have many video files on S3, I'm using ffmpeg to rencode it from flv to mp4 and rusn flawless. I deleted the new file (video.mp4) from amazon web manager (not from command line) to try another encoding params and run ffmpeg again, but it says : "the file already exists", but it is deleted on s3. I thinks there is a cache couting this deleted file, there is a way to clear this cache?
Author
Owner

@ggtakec commented on GitHub (May 14, 2016):

@crsepulv
If you can, please try to set max_stat_cache_size=0 option.
Maybe stat cache is enable on your case.

And if you do not solve this problem, please post new issue.
Thanks in advance for your assistance.

<!-- gh-comment-id:219233018 --> @ggtakec commented on GitHub (May 14, 2016): @crsepulv If you can, please try to set max_stat_cache_size=0 option. Maybe stat cache is enable on your case. And if you do not solve this problem, please post new issue. Thanks in advance for your assistance.
Author
Owner

@j08lue commented on GitHub (Jul 2, 2019):

This is a very old, closed thread, but just for the record:

I have the same issue with cache growing too large and no apparent way to disable it. Tried use_cache="" and ensure_diskfree=50 to no avail.

Why would max_stat_cache_size mean anything here @ggtakec? As far as I can see, that cache is tiny (metadata only) and in memory and not on disk.

<!-- gh-comment-id:507651837 --> @j08lue commented on GitHub (Jul 2, 2019): This is a very old, closed thread, but just for the record: I have the same issue with cache growing too large and no apparent way to disable it. Tried `use_cache=""` and `ensure_diskfree=50` to no avail. Why would `max_stat_cache_size` mean anything here @ggtakec? As far as I can see, that cache is tiny (metadata only) and in memory and not on disk.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#60
No description provided.