[GH-ISSUE #928] The upload element blocking subsequent transfers #528

Closed
opened 2026-03-04 01:46:23 +03:00 by kerem · 5 comments
Owner

Originally created by @MLC-Mat on GitHub (Jan 24, 2019).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/928

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.
Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD

Version of s3fs being used (s3fs --version)

Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt)

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

example: 2.9.4

fuse/bionic,now 2.9.7-1ubuntu1 amd64 [installed]
libfuse2/bionic,now 2.9.7-1ubuntu1 amd64 [installed]

Kernel information (uname -r)

4.15.0-1021-aws

GNU/Linux Distribution, if applicable (cat /etc/os-release)

~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"

s3fs command line used, if applicable

na

/etc/fstab entry, if applicable

s3fs#mybucket/sterling/mybucketfuse _netdev,allow_other,iam_role=myawsrole,parallel_count=50,uid=1001,umask=0077,url=http
s://s3-eu-west-1.amazonaws.com 0 0

s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs)

if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages

Details about issue

We have s3fs hung off a couple of HA'd SFTP servers. The SFTP servers are used almost exclusively for uploads. Uploads typically go to different locations, thus no real issues with synchronization etc. SFTP servers do of course like to do an ls at the end of transfers etc, but there is very little actual fetching of files.

A number of file transfers will come in at the same time. The file sizes range from 1-40GB (but transfers are over a performant direct connect connection to AWS). When the transfers all kick off at the same time they transfer up to the SFTP server just fine, however when the initial transfer completes and the upload to S3 begins, the other transfers block and wait. Most SFTP clients handle this fine, but we have one IBM product that is a fussy customer that doesnt. Is there any way to stop the other uploads from blocking? Is there anything I can do with the parameters to further tune things. The ec2 instances are decent spec network focussed instances (upload to S3 is impressively fast).

Many thanks

Mat

Originally created by @MLC-Mat on GitHub (Jan 24, 2019). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/928 ### Additional Information _The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all._ _Keep in mind that the commands we provide to retrieve information are oriented to GNU/Linux Distributions, so you could need to use others if you use s3fs on macOS or BSD_ #### Version of s3fs being used (s3fs --version) Amazon Simple Storage Service File System V1.82(commit:unknown) with GnuTLS(gcrypt) #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) _example: 2.9.4_ fuse/bionic,now 2.9.7-1ubuntu1 amd64 [installed] libfuse2/bionic,now 2.9.7-1ubuntu1 amd64 [installed] #### Kernel information (uname -r) 4.15.0-1021-aws #### GNU/Linux Distribution, if applicable (cat /etc/os-release) ~$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" #### s3fs command line used, if applicable na #### /etc/fstab entry, if applicable s3fs#mybucket/sterling/mybucketfuse _netdev,allow_other,iam_role=myawsrole,parallel_count=50,uid=1001,umask=0077,url=http s://s3-eu-west-1.amazonaws.com 0 0 #### s3fs syslog messages (grep s3fs /var/log/syslog, journalctl | grep s3fs, or s3fs outputs) _if you execute s3fs with dbglevel, curldbg option, you can get detail debug messages_ ``` ``` ### Details about issue We have s3fs hung off a couple of HA'd SFTP servers. The SFTP servers are used almost exclusively for uploads. Uploads typically go to different locations, thus no real issues with synchronization etc. SFTP servers do of course like to do an ls at the end of transfers etc, but there is very little actual fetching of files. A number of file transfers will come in at the same time. The file sizes range from 1-40GB (but transfers are over a performant direct connect connection to AWS). When the transfers all kick off at the same time they transfer up to the SFTP server just fine, however when the initial transfer completes and the upload to S3 begins, the other transfers block and wait. Most SFTP clients handle this fine, but we have one IBM product that is a fussy customer that doesnt. Is there any way to stop the other uploads from blocking? Is there anything I can do with the parameters to further tune things. The ec2 instances are decent spec network focussed instances (upload to S3 is impressively fast). Many thanks Mat
kerem closed this issue 2026-03-04 01:46:23 +03:00
Author
Owner

@MLC-Mat commented on GitHub (Jan 24, 2019):

I should specifically note.. the uploads that are blocked are those still moving from client to SFTP server. The transfers just halt and then all start moving once the S3 upload of the job at the front completes

<!-- gh-comment-id:457129604 --> @MLC-Mat commented on GitHub (Jan 24, 2019): I should specifically note.. the uploads that are blocked are those still moving from client to SFTP server. The transfers just halt and then all start moving once the S3 upload of the job at the front completes
Author
Owner

@gaul commented on GitHub (Feb 2, 2019):

I reproduced these symptoms by writing a large file into the root directory then running ls on the same directory. I found that ls hangs getting a lock on the in-progress write:

(gdb) thread 5
[Switching to thread 5 (Thread 0x7fa80ffff700 (LWP 32087))]
#0  0x00007fa81e319b4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fa81e312ed4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x000000000045bccc in AutoLock::AutoLock (this=0x7fa80fffe740, pmutex=0x7fa81002e9f0, no_wait=false)
    at s3fs_util.cpp:433
#3  0x0000000000466667 in FdEntity::Open (this=0x7fa81002e9f0, pmeta=0x0, size=-1, time=-1, no_fd_lock_wait=false)
    at fdcache.cpp:799
#4  0x000000000046e120 in FdManager::Open (this=0x4a6280 <FdManager::singleton>, path=0x7fa810028880 "/2gb", pmeta=0x0,
    size=-1, time=-1, force_tmpfile=false, is_create=false, no_fd_lock_wait=false) at fdcache.cpp:2141
#5  0x000000000046e31d in FdManager::ExistOpen (this=0x4a6280 <FdManager::singleton>, path=0x7fa810028880 "/2gb",
    existfd=-1, ignore_existfd=false) at fdcache.cpp:2152
#6  0x0000000000409f39 in s3fs_getattr (path=0x7fa810028880 "/2gb", stbuf=0x7fa80fffeb00) at s3fs.cpp:859
#7  0x00007fa81eb43290 in lookup_path () from /lib64/libfuse.so.2
#8  0x00007fa81eb43402 in fuse_lib_lookup () from /lib64/libfuse.so.2
#9  0x00007fa81eb4e1e8 in fuse_ll_process_buf () from /lib64/libfuse.so.2
#10 0x00007fa81eb4add0 in fuse_do_work () from /lib64/libfuse.so.2
#11 0x00007fa81e31058e in start_thread () from /lib64/libpthread.so.0
#12 0x00007fa81e23f6a3 in clone () from /lib64/libc.so.6
(gdb) print &fdent_lock 
$3 = (pthread_mutex_t *) 0x7fa81002e9f0
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fa81da8a700 (LWP 32079))]
#0  0x00007fa81e3190c6 in do_futex_wait.constprop () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fa81e3190c6 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007fa81e3191b8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x000000000044976c in Semaphore::wait (this=0x7fa81da89240) at psemaphore.h:63
#3  0x00000000004463db in S3fsMultiCurl::MultiPerform (this=0x7fa81da893a0) at curl.cpp:3931
#4  0x0000000000447386 in S3fsMultiCurl::Request (this=0x7fa81da893a0) at curl.cpp:4077
#5  0x000000000043449e in S3fsCurl::ParallelMultipartUploadRequest (tpath=0x7fa81002ea50 "/2gb",
    meta=std::map with 11 elements = {...}, fd=10) at curl.cpp:1391
#6  0x000000000046ad4b in FdEntity::RowFlush (this=0x7fa81002e9f0, tpath=0x0, force_sync=false) at fdcache.cpp:1532
#7  0x0000000000425412 in FdEntity::Flush (this=0x7fa81002e9f0, force_sync=false) at fdcache.h:173
#8  0x0000000000413a8f in s3fs_flush (path=0x7fa814029c10 "/2gb", fi=0x7fa81da89bb0) at s3fs.cpp:2224
#9  0x00007fa81eb46e36 in fuse_flush_common () from /lib64/libfuse.so.2
#10 0x00007fa81eb470c4 in fuse_lib_flush () from /lib64/libfuse.so.2
#11 0x00007fa81eb4d6f6 in do_flush () from /lib64/libfuse.so.2
#12 0x00007fa81eb4e1e8 in fuse_ll_process_buf () from /lib64/libfuse.so.2
#13 0x00007fa81eb4add0 in fuse_do_work () from /lib64/libfuse.so.2
#14 0x00007fa81e31058e in start_thread () from /lib64/libpthread.so.0
#15 0x00007fa81e23f6a3 in clone () from /lib64/libc.so.6
(gdb) print &fdent_lock 
$2 = (pthread_mutex_t *) 0x7fa81002e9f0
<!-- gh-comment-id:459926590 --> @gaul commented on GitHub (Feb 2, 2019): I reproduced these symptoms by writing a large file into the root directory then running `ls` on the same directory. I found that `ls` hangs getting a lock on the in-progress write: ``` (gdb) thread 5 [Switching to thread 5 (Thread 0x7fa80ffff700 (LWP 32087))] #0 0x00007fa81e319b4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fa81e312ed4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x000000000045bccc in AutoLock::AutoLock (this=0x7fa80fffe740, pmutex=0x7fa81002e9f0, no_wait=false) at s3fs_util.cpp:433 #3 0x0000000000466667 in FdEntity::Open (this=0x7fa81002e9f0, pmeta=0x0, size=-1, time=-1, no_fd_lock_wait=false) at fdcache.cpp:799 #4 0x000000000046e120 in FdManager::Open (this=0x4a6280 <FdManager::singleton>, path=0x7fa810028880 "/2gb", pmeta=0x0, size=-1, time=-1, force_tmpfile=false, is_create=false, no_fd_lock_wait=false) at fdcache.cpp:2141 #5 0x000000000046e31d in FdManager::ExistOpen (this=0x4a6280 <FdManager::singleton>, path=0x7fa810028880 "/2gb", existfd=-1, ignore_existfd=false) at fdcache.cpp:2152 #6 0x0000000000409f39 in s3fs_getattr (path=0x7fa810028880 "/2gb", stbuf=0x7fa80fffeb00) at s3fs.cpp:859 #7 0x00007fa81eb43290 in lookup_path () from /lib64/libfuse.so.2 #8 0x00007fa81eb43402 in fuse_lib_lookup () from /lib64/libfuse.so.2 #9 0x00007fa81eb4e1e8 in fuse_ll_process_buf () from /lib64/libfuse.so.2 #10 0x00007fa81eb4add0 in fuse_do_work () from /lib64/libfuse.so.2 #11 0x00007fa81e31058e in start_thread () from /lib64/libpthread.so.0 #12 0x00007fa81e23f6a3 in clone () from /lib64/libc.so.6 (gdb) print &fdent_lock $3 = (pthread_mutex_t *) 0x7fa81002e9f0 ``` ``` (gdb) thread 2 [Switching to thread 2 (Thread 0x7fa81da8a700 (LWP 32079))] #0 0x00007fa81e3190c6 in do_futex_wait.constprop () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fa81e3190c6 in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x00007fa81e3191b8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x000000000044976c in Semaphore::wait (this=0x7fa81da89240) at psemaphore.h:63 #3 0x00000000004463db in S3fsMultiCurl::MultiPerform (this=0x7fa81da893a0) at curl.cpp:3931 #4 0x0000000000447386 in S3fsMultiCurl::Request (this=0x7fa81da893a0) at curl.cpp:4077 #5 0x000000000043449e in S3fsCurl::ParallelMultipartUploadRequest (tpath=0x7fa81002ea50 "/2gb", meta=std::map with 11 elements = {...}, fd=10) at curl.cpp:1391 #6 0x000000000046ad4b in FdEntity::RowFlush (this=0x7fa81002e9f0, tpath=0x0, force_sync=false) at fdcache.cpp:1532 #7 0x0000000000425412 in FdEntity::Flush (this=0x7fa81002e9f0, force_sync=false) at fdcache.h:173 #8 0x0000000000413a8f in s3fs_flush (path=0x7fa814029c10 "/2gb", fi=0x7fa81da89bb0) at s3fs.cpp:2224 #9 0x00007fa81eb46e36 in fuse_flush_common () from /lib64/libfuse.so.2 #10 0x00007fa81eb470c4 in fuse_lib_flush () from /lib64/libfuse.so.2 #11 0x00007fa81eb4d6f6 in do_flush () from /lib64/libfuse.so.2 #12 0x00007fa81eb4e1e8 in fuse_ll_process_buf () from /lib64/libfuse.so.2 #13 0x00007fa81eb4add0 in fuse_do_work () from /lib64/libfuse.so.2 #14 0x00007fa81e31058e in start_thread () from /lib64/libpthread.so.0 #15 0x00007fa81e23f6a3 in clone () from /lib64/libc.so.6 (gdb) print &fdent_lock $2 = (pthread_mutex_t *) 0x7fa81002e9f0 ```
Author
Owner

@MLC-Mat commented on GitHub (Feb 2, 2019):

The above ties in exactly with what we are seeing. Thanks for the diagnosis effort so far

<!-- gh-comment-id:459949690 --> @MLC-Mat commented on GitHub (Feb 2, 2019): The above ties in exactly with what we are seeing. Thanks for the diagnosis effort so far
Author
Owner

@gaul commented on GitHub (Jul 10, 2019):

@MLC-Mat Could you test with the latest master and report back?

<!-- gh-comment-id:510198076 --> @gaul commented on GitHub (Jul 10, 2019): @MLC-Mat Could you test with the latest master and report back?
Author
Owner

@MLC-Mat commented on GitHub (Jul 10, 2019):

Sure I will work with client to retest... But I won't be with that client until Monday... thanks a lot for your effort here, it is very much appreciated.

<!-- gh-comment-id:510210508 --> @MLC-Mat commented on GitHub (Jul 10, 2019): Sure I will work with client to retest... But I won't be with that client until Monday... thanks a lot for your effort here, it is very much appreciated.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#528
No description provided.