[GH-ISSUE #2600] Refactoring about Thread / Curl Handle #1245

Closed
opened 2026-03-04 01:52:31 +03:00 by kerem · 2 comments
Owner

Originally created by @ggtakec on GitHub (Nov 10, 2024).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2600

Originally assigned to: @ggtakec on GitHub.

Background

Currently, s3fs-fuse has complicated and cumbersome code related to threads and curl handles due to years of modifications and feature additions.
This is not the only cause, but a data race occurs in Curl + OpenSSL(#1997 #2211 #2209 #2220), and in order to fix this, the composition of threads and curl handles needs to be changed.
I believe that we can reduce the number of objects that are deployed in unnecessary dynamic memory, and reduce the complexity of managing them.

Current problems / issues

Thread management is complicated

When processing a request, there are three logic steps(included using and not using threads):

  • Threads are not started, and requests are processed using Curl Handle directly in the thread where the hook is called from FUSE(all early s3fs were like this)
  • Threads are dynamically started and stopped, and requests are processed in the thread (only some processes from when MultipartUpload was supported)
  • Threads are started when s3fs is started, pooled, and requests are processed in the threads in this pool (used in StreamUpload)

These things could be unified.
Currently it's getting more complicated and affecting maintenance.
Also, s3fs options related to this should be unified too.

Curl handles and OpenSSL Data Rase(#1997 #2211 #2209 #2220)

I have identified the cause of this Data Race.
It is a problem with the Curl Share handle settings/using and is related to CURL_LOCK_DATA_SSL_SESSION.
This CURL_LOCK_DATA_SSL_SESSION prohibits the Curl Handle from being used in another thread, as shown below:
(It is not supported to share SSL sessions between multiple concurrent threads)[https://curl.se/libcurl/c/CURLSHOPT_SHARE.html]

s3fs pools Curl handles, but the threads that use these Curl handles are not pinned.
Therefore, the conditions for CURL_LOCK_DATA_SSL_SESSION are not met, so CURL_LOCK_DATA_SSL_SESSION cannot be enabled as it should be.
In the current code, it is enabled, which causes a data race.

Composition modifications to resolve

To solve this problem, we should change all request processing to run in a sub thread.
This will consolidate request processing and improve maintainability.
Also, each request can be processed synchronously or asynchronously.

We should integrate thread management into the ThreadPoolMan class.
This will allow all request processing to be done in parallel using a thread pool.
And it will also integrate the parallel concurrency options of s3fs.

A thread worker should hold one S3fsCurl, and the S3fsCurl should hold one Curl Handle.
This ensures that only one thread uses the Curl Handle.
Because the thread using the Curl Handle is pinned, you can enable a Curl Share Handle and CURL_LOCK_DATA_SSL_SESSION without a Data Race.

About Pull Requesta

To address this issue, I will submit several PRs in succession.
These are essentially one modification, but I'm splitting them up to make the PRs easier to understand.
Note that these PRs will need to be merged consecutively, since it's actually a single modification code.
I will link all PRs to the reply to this issue.

@gaul
Please help me to move forward with the PR.

Originally created by @ggtakec on GitHub (Nov 10, 2024). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/2600 Originally assigned to: @ggtakec on GitHub. ### Background Currently, s3fs-fuse has complicated and cumbersome code related to threads and curl handles due to years of modifications and feature additions. This is not the only cause, but a data race occurs in `Curl` + `OpenSSL`(#1997 #2211 #2209 #2220), and in order to fix this, the composition of threads and curl handles needs to be changed. I believe that we can reduce the number of objects that are deployed in unnecessary dynamic memory, and reduce the complexity of managing them. ### Current problems / issues #### Thread management is complicated When processing a request, there are three logic steps(included using and not using threads): - Threads are not started, and requests are processed using Curl Handle directly in the thread where the hook is called from FUSE(all early s3fs were like this) - Threads are dynamically started and stopped, and requests are processed in the thread (only some processes from when MultipartUpload was supported) - Threads are started when s3fs is started, pooled, and requests are processed in the threads in this pool (used in StreamUpload) These things could be unified. Currently it's getting more complicated and affecting maintenance. Also, s3fs options related to this should be unified too. #### Curl handles and OpenSSL Data Rase(#1997 #2211 #2209 #2220) I have identified the cause of this Data Race. It is a problem with the Curl Share handle settings/using and is related to `CURL_LOCK_DATA_SSL_SESSION`. This `CURL_LOCK_DATA_SSL_SESSION` prohibits the Curl Handle from being used in another thread, as shown below: (It is not supported to share SSL sessions between multiple concurrent threads)[https://curl.se/libcurl/c/CURLSHOPT_SHARE.html] s3fs pools Curl handles, but the threads that use these Curl handles are not pinned. Therefore, the conditions for `CURL_LOCK_DATA_SSL_SESSION` are not met, so `CURL_LOCK_DATA_SSL_SESSION` cannot be enabled as it should be. In the current code, it is enabled, which causes a data race. ### Composition modifications to resolve To solve this problem, we should change all request processing to run in a sub thread. This will consolidate request processing and improve maintainability. Also, each request can be processed synchronously or asynchronously. We should integrate thread management into the `ThreadPoolMan` class. This will allow all request processing to be done in parallel using a thread pool. And it will also integrate the parallel concurrency options of s3fs. A thread worker should hold one S3fsCurl, and the S3fsCurl should hold one Curl Handle. This ensures that only one thread uses the Curl Handle. Because the thread using the Curl Handle is pinned, you can enable a Curl Share Handle and `CURL_LOCK_DATA_SSL_SESSION` without a Data Race. ### About Pull Requesta To address this issue, I will submit several PRs in succession. These are essentially one modification, but I'm splitting them up to make the PRs easier to understand. Note that these PRs will need to be merged consecutively, since it's actually a single modification code. I will link all PRs to the reply to this issue. @gaul Please help me to move forward with the PR.
kerem 2026-03-04 01:52:31 +03:00
Author
Owner

@ggtakec commented on GitHub (Nov 10, 2024):

@gaul
If you want to check the final code, please check the fix in #2608.
(However, I think it's better to check them in order...)

<!-- gh-comment-id:2466611161 --> @ggtakec commented on GitHub (Nov 10, 2024): @gaul If you want to check the final code, please check the fix in #2608. (However, I think it's better to check them in order...)
Author
Owner

@ggtakec commented on GitHub (Nov 28, 2024):

All PRs have been merged so I will close this issue.
@gaul Thanks.

<!-- gh-comment-id:2506856785 --> @ggtakec commented on GitHub (Nov 28, 2024): All PRs have been merged so I will close this issue. @gaul Thanks.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#1245
No description provided.