[GH-ISSUE #2709] first line(cache file stat) is different: "49425:0" != "49425:26214400" #1280

Closed
opened 2026-03-04 01:52:45 +03:00 by kerem · 8 comments
Owner

Originally created by @gaul on GitHub (Aug 22, 2025).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2709

CI shows this symptom sometimes and I reproduced it locally in Valgrind:

s3fs: [INF]       curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=0][size=10485760][part=1]
s3fs: [INF]       curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081
s3fs: [INF]       curl.cpp:RequestPerform(1985): HTTP response code 200
s3fs: [INF]       curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool.
s3fs: [INF]       s3fs_threadreqs.cpp:multipart_upload_part_req_threadworker(332): Multipart Upload Part Worker [path=/testrun-15985/big-file-s3fs-32049.txt][upload_id=d8018f10-3689-4929-b912-214ed80d880f][upload_fd=13][start=10485760][size=10485760][is_copy=false][part_num=2]
s3fs: [INF]       curl.cpp:MultipartUploadPartRequest(3846): Multipart Upload Part [tpath=/testrun-15985/big-file-s3fs-32049.txt][upload_fd=13][start=10485760][size=10485760][part_num=2][upload_id=d8018f10-3689-4929-b912-214ed80d880f][is_copy=false]
s3fs: [INF]       curl.cpp:MultipartUploadPartSetup(3400): Upload Part [path=/testrun-15985/big-file-s3fs-32049.txt][start=10485760][size=10485760][part=2]
s3fs: [INF]       curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=10485760][size=10485760][part=2]
s3fs: [INF]       curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081
s3fs: [INF] s3fs.cpp:s3fs_flush(3060): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][pid=1657532,uid=1000,gid=1000]
s3fs: [INF]       fdcache_entity.cpp:RowFlushHasLock(1371): [tpath=][path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][physical_fd=19]
s3fs: [INF]       fdcache_entity.cpp:RowFlushStreamMultipart(1792): [tpath=][path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][physical_fd=19][mix_upload=true]
s3fs: [INF]       curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool.
s3fs: [INF]       s3fs_threadreqs.cpp:multipart_upload_part_req_threadworker(332): Multipart Upload Part Worker [path=/testrun-15985/big-file-s3fs-32049.txt][upload_id=d8018f10-3689-4929-b912-214ed80d880f][upload_fd=13][start=20971520][size=5242880][is_copy=false][part_num=3]
s3fs: [INF]       curl.cpp:MultipartUploadPartRequest(3846): Multipart Upload Part [tpath=/testrun-15985/big-file-s3fs-32049.txt][upload_fd=13][start=20971520][size=5242880][part_num=3][upload_id=d8018f10-3689-4929-b912-214ed80d880f][is_copy=false]
s3fs: [INF]       curl.cpp:MultipartUploadPartSetup(3400): Upload Part [path=/testrun-15985/big-file-s3fs-32049.txt][start=20971520][size=5242880][part=3]
s3fs: [INF]       curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=20971520][size=5242880][part=3]
s3fs: [INF]       curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081
s3fs: [INF]       curl.cpp:RequestPerform(1985): HTTP response code 200
s3fs: [INF]       curl.cpp:RequestPerform(1985): HTTP response code 200
s3fs: [INF]       curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool.
s3fs: [INF]       s3fs_threadreqs.cpp:complete_multipart_upload_threadworker(373): Complete Multipart Upload Request [path=/testrun-15985/big-file-s3fs-32049.txt][upload id=d8018f10-3689-4929-b912-214ed80d880f][etaglist=3]
s3fs: [INF]       curl.cpp:MultipartUploadComplete(3438): [tpath=/testrun-15985/big-file-s3fs-32049.txt][parts=3]
s3fs: [INF]       curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081
s3fs: [INF]       curl.cpp:RequestPerform(1985): HTTP response code 200
s3fs: [INF]       cache.cpp:DelStatHasLock(259): delete stat cache entry[path=/testrun-15985/big-file-s3fs-32049.txt]
s3fs: [INF] s3fs.cpp:s3fs_getattr(892): [path=/testrun-15985][pid=1657586,uid=1000,gid=1000]
s3fs: [INF] s3fs.cpp:s3fs_release(3170): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][pid=0,uid=0,gid=0]
s3fs: [INF]       fdcache.cpp:GetFdEntityHasLock(479): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2]
first line(cache file stat) is different: "49425:0" != "49425:26214400"
test_cache_file_stat failed
Originally created by @gaul on GitHub (Aug 22, 2025). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2709 CI shows this symptom sometimes and I reproduced it locally in Valgrind: ``` s3fs: [INF] curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=0][size=10485760][part=1] s3fs: [INF] curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081 s3fs: [INF] curl.cpp:RequestPerform(1985): HTTP response code 200 s3fs: [INF] curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool. s3fs: [INF] s3fs_threadreqs.cpp:multipart_upload_part_req_threadworker(332): Multipart Upload Part Worker [path=/testrun-15985/big-file-s3fs-32049.txt][upload_id=d8018f10-3689-4929-b912-214ed80d880f][upload_fd=13][start=10485760][size=10485760][is_copy=false][part_num=2] s3fs: [INF] curl.cpp:MultipartUploadPartRequest(3846): Multipart Upload Part [tpath=/testrun-15985/big-file-s3fs-32049.txt][upload_fd=13][start=10485760][size=10485760][part_num=2][upload_id=d8018f10-3689-4929-b912-214ed80d880f][is_copy=false] s3fs: [INF] curl.cpp:MultipartUploadPartSetup(3400): Upload Part [path=/testrun-15985/big-file-s3fs-32049.txt][start=10485760][size=10485760][part=2] s3fs: [INF] curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=10485760][size=10485760][part=2] s3fs: [INF] curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081 s3fs: [INF] s3fs.cpp:s3fs_flush(3060): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][pid=1657532,uid=1000,gid=1000] s3fs: [INF] fdcache_entity.cpp:RowFlushHasLock(1371): [tpath=][path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][physical_fd=19] s3fs: [INF] fdcache_entity.cpp:RowFlushStreamMultipart(1792): [tpath=][path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][physical_fd=19][mix_upload=true] s3fs: [INF] curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool. s3fs: [INF] s3fs_threadreqs.cpp:multipart_upload_part_req_threadworker(332): Multipart Upload Part Worker [path=/testrun-15985/big-file-s3fs-32049.txt][upload_id=d8018f10-3689-4929-b912-214ed80d880f][upload_fd=13][start=20971520][size=5242880][is_copy=false][part_num=3] s3fs: [INF] curl.cpp:MultipartUploadPartRequest(3846): Multipart Upload Part [tpath=/testrun-15985/big-file-s3fs-32049.txt][upload_fd=13][start=20971520][size=5242880][part_num=3][upload_id=d8018f10-3689-4929-b912-214ed80d880f][is_copy=false] s3fs: [INF] curl.cpp:MultipartUploadPartSetup(3400): Upload Part [path=/testrun-15985/big-file-s3fs-32049.txt][start=20971520][size=5242880][part=3] s3fs: [INF] curl.cpp:MultipartUploadContentPartSetup(3646): [tpath=/testrun-15985/big-file-s3fs-32049.txt][start=20971520][size=5242880][part=3] s3fs: [INF] curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081 s3fs: [INF] curl.cpp:RequestPerform(1985): HTTP response code 200 s3fs: [INF] curl.cpp:RequestPerform(1985): HTTP response code 200 s3fs: [INF] curl.cpp:CreateCurlHandle(1511): already has handle, so destroyed it or restored it to pool. s3fs: [INF] s3fs_threadreqs.cpp:complete_multipart_upload_threadworker(373): Complete Multipart Upload Request [path=/testrun-15985/big-file-s3fs-32049.txt][upload id=d8018f10-3689-4929-b912-214ed80d880f][etaglist=3] s3fs: [INF] curl.cpp:MultipartUploadComplete(3438): [tpath=/testrun-15985/big-file-s3fs-32049.txt][parts=3] s3fs: [INF] curl_util.cpp:url_to_host(268): url is http://127.0.0.1:8081 s3fs: [INF] curl.cpp:RequestPerform(1985): HTTP response code 200 s3fs: [INF] cache.cpp:DelStatHasLock(259): delete stat cache entry[path=/testrun-15985/big-file-s3fs-32049.txt] s3fs: [INF] s3fs.cpp:s3fs_getattr(892): [path=/testrun-15985][pid=1657586,uid=1000,gid=1000] s3fs: [INF] s3fs.cpp:s3fs_release(3170): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2][pid=0,uid=0,gid=0] s3fs: [INF] fdcache.cpp:GetFdEntityHasLock(479): [path=/testrun-15985/big-file-s3fs-32049.txt][pseudo_fd=2] first line(cache file stat) is different: "49425:0" != "49425:26214400" test_cache_file_stat failed ```
kerem closed this issue 2026-03-04 01:52:46 +03:00
Author
Owner

@ggtakec commented on GitHub (Aug 23, 2025):

@gaul Thanks, I was able to reproduce it, so I will investigate.

<!-- gh-comment-id:3216553265 --> @ggtakec commented on GitHub (Aug 23, 2025): @gaul Thanks, I was able to reproduce it, so I will investigate.
Author
Owner

@ggtakec commented on GitHub (Aug 24, 2025):

Although I was able to reproduce this issue, it appears to be a bug with very low reproducibility. (I was able to confirm it less than 0.1%).

The likely cause of this is that when updating the file that manages the stats information for the FileCache file, the update is read before the changes are reflected.
I was unable to find any other reason for this low occurrence.
In fact, since a sync was not performed after updating the stats information for the FileCache file, this seems highly likely.

I have submitted a PR #2710 to fix this, so please check it out.

<!-- gh-comment-id:3217656584 --> @ggtakec commented on GitHub (Aug 24, 2025): Although I was able to reproduce this issue, it appears to be a bug with very low reproducibility. (I was able to confirm it less than 0.1%). The likely cause of this is that when updating the file that manages the stats information for the FileCache file, the update is read before the changes are reflected. I was unable to find any other reason for this low occurrence. In fact, since a sync was not performed after updating the stats information for the FileCache file, this seems highly likely. I have submitted a PR #2710 to fix this, so please check it out.
Author
Owner

@gaul commented on GitHub (Aug 24, 2025):

Example from CI: https://github.com/s3fs-fuse/s3fs-fuse/actions/runs/16602135084/job/47222510366?pr=1867

<!-- gh-comment-id:3217874114 --> @gaul commented on GitHub (Aug 24, 2025): Example from CI: https://github.com/s3fs-fuse/s3fs-fuse/actions/runs/16602135084/job/47222510366?pr=1867
Author
Owner

@ggtakec commented on GitHub (Aug 26, 2025):

I posted new PR #2714 for this issue, and closed #2710.
Please check it.

<!-- gh-comment-id:3222477399 --> @ggtakec commented on GitHub (Aug 26, 2025): I posted new PR #2714 for this issue, and closed #2710. Please check it.
Author
Owner
<!-- gh-comment-id:3226548346 --> @gaul commented on GitHub (Aug 27, 2025): I see a different symptom now: ``` 2025-08-27T02:32:19.7903155Z could not get first or end line from cache file stat: /tmp/s3fs-cache/.s3fs-integration-test.stat/testrun-24844/big-file-s3fs-27679.txt 2025-08-27T02:32:19.7903577Z 2025-08-27T02:32:19.7903661Z test_cache_file_stat failed ``` From: https://productionresultssa7.blob.core.windows.net/actions-results/0e186977-0912-4e07-94d3-b6d76abf919b/workflow-job-run-e4bb639a-e3e5-5382-9d4a-c186d04f0676/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-08-27T02%3A47%3A26Z&sig=YPA%2B5%2Bfwyg785bCqZWYDnbm4js5J0mw%2F5mJQDU9eApA%3D&ske=2025-08-27T11%3A07%3A45Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-08-26T23%3A07%3A45Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-05-05&sp=r&spr=https&sr=b&st=2025-08-27T02%3A37%3A21Z&sv=2025-05-05
Author
Owner

@ggtakec commented on GitHub (Aug 27, 2025):

Thanks for reporting it, I'll look into it.

<!-- gh-comment-id:3230138580 --> @ggtakec commented on GitHub (Aug 27, 2025): Thanks for reporting it, I'll look into it.
Author
Owner

@ggtakec commented on GitHub (Aug 28, 2025):

It was difficult to reproduce the could not get first or end line from cache file stat: error locally, but I reviewed the code again.

The possibility of the above error occurring with test_cache_file_stat is the same as last time: it lies in File upload->flush->release.
PageList::Serialize() is called during flush and release, and it seems that the file was read during the release process.

The problem is that in PageList::Serialize(), the procedure for updating the FileCacheStat file contents is ftruncate(0 byte)->pwrite. If the file is read in between these steps, the file will be 0 bytes.
Since this possibility is not zero, one of the following solutions is required:
(1) rename(guarantees atomic operation) or (2) lock the file.
Since rename would make the processing too complicated, I will modify the code to use method (2) to use flock.
By gaul's advice, I changed my mind and decided to avoid this by using rename instead of flock.

<!-- gh-comment-id:3233942022 --> @ggtakec commented on GitHub (Aug 28, 2025): It was difficult to reproduce the `could not get first or end line from cache file stat:` error locally, but I reviewed the code again. The possibility of the above error occurring with `test_cache_file_stat` is the same as last time: it lies in File upload->flush->release. `PageList::Serialize()` is called during flush and release, and it seems that the file was read during the release process. The problem is that in `PageList::Serialize()`, the procedure for updating the FileCacheStat file contents is `ftruncate`(0 byte)->`pwrite`. If the file is read in between these steps, the file will be 0 bytes. Since this possibility is not zero, one of the following solutions is required: (1) rename(guarantees atomic operation) or (2) lock the file. ~~Since `rename` would make the processing too complicated, I will modify the code to use method (2) to use `flock`.~~ By gaul's advice, I changed my mind and decided to avoid this by **using rename instead of flock.**
Author
Owner

@ggtakec commented on GitHub (Aug 30, 2025):

I think I've fixed this issue.
I'll close it and reopen it if the problem persists.

<!-- gh-comment-id:3239023393 --> @ggtakec commented on GitHub (Aug 30, 2025): I think I've fixed this issue. I'll close it and reopen it if the problem persists.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1280
No description provided.