mirror of
https://github.com/s3fs-fuse/s3fs-fuse.git
synced 2026-04-25 05:16:00 +03:00
[GH-ISSUE #1772] s3fs aborts in GetXmlNsUrl during parallel execution #915
Labels
No labels
bug
bug
dataloss
duplicate
enhancement
feature request
help wanted
invalid
need info
performance
pull-request
question
question
testing
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/s3fs-fuse#915
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @CarstenGrohmann on GitHub (Oct 7, 2021).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1772
s3fs has been terminated itself twice whiteout further notice on my system. After enabling core dumps I got one, but unfortunately I deleted the binary. I recompiled the binary. The analysis with gdb don't show any junk, thereby I assume core dump and binary matches well.
Core dump analysis
The dump shows an abort in
GetXmlNsUrlduring a copy assignment ofstrNs(typestd::string):The threads 18-14, 10-9, 7, 5, 2-1 execute the same code at the same time:
It looks like the parallel execution of
GetXmlNsUrltriggers an C++ exception in line 62:github.com/s3fs-fuse/s3fs-fuse@b4edad86d6/src/s3fs_xml.cpp (L37-L66)Reproducibility
The issue occurs randomly after a runtime of several days. I can't reproduce this issue manually.
Additional observation
The content of
__strinstd::string::assignis unexpected long. I would expect a smaller string terminating after the first null byte:__str="http://s3.amazonaws.com/doc/2006-03-01/\000.@CarstenGrohmann commented on GitHub (Oct 26, 2021):
Adding an
AutoLockwould be a simple workaround, but it prevents parallel execution of this function:I've tested the patch functionally, but the long time test isn't finished yet. I'll share the test results after s3fs run more than two weeks without abort.
@ggtakec commented on GitHub (Oct 26, 2021):
@CarstenGrohmann Thank you for contacting us about the problem. The reply was late.
As you pointed out, the following size part of backtrace is abnormal.
I tried creating a PR #1789 to check the length(xmlStrlen) of the pointer in xmlChar.
If possible, it would be helpful if you could test it.
@ggtakec commented on GitHub (Oct 26, 2021):
If multiple threads are writing to memory at the same time and this bug is occurring, the above PR fix may not make sense.
(I will check it just in case.)
@CarstenGrohmann commented on GitHub (Oct 26, 2021):
@ggtakec I'll test the PR. Since I can't reproduce the issue, it may take a two weeks or so to check if s3fs aborts abnormally or if it runs fine.
@ggtakec commented on GitHub (Oct 26, 2021):
Thanks for your kindness.
I think that the PR has been newly changed to use AutoLock, so you probably won't have the same problem.
We may merge this PR without waiting for the reproduction test.
If you have time, please try it.
@CarstenGrohmann commented on GitHub (Nov 16, 2021):
I run the current development version of s3fs for several days and transferred more that 2M files and the error doesn't occur.
#1789 solves this issue.
Thank you!