[GH-ISSUE #1416] S3 failed to read csv using s3fs #748

Closed
opened 2026-03-04 01:48:27 +03:00 by kerem · 7 comments
Owner

Originally created by @mrsiano on GitHub (Sep 21, 2020).
Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1416

Additional Information

failed to read CSV from AWS S3 bucket mounted via (sf3s)

Version of s3fs being used (s3fs --version)

V1.87

Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse)

2.9.4

Kernel information (uname -r)

4.14.72-68.55.amzn1.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Amazon Linux AMI"
VERSION="2018.03"
ID="amzn"
ID_LIKE="rhel fedora"
VERSION_ID="2018.03"
PRETTY_NAME="Amazon Linux AMI 2018.03"
ANSI_COLOR="0;33"
CPE_NAME="cpe:/o:amazon:linux:2018.03:ga"
HOME_URL="http://aws.amazon.com/amazon-linux-ami/"

Details about issue

ec2 host running docker app mounting s3 bucket via s3fs
when trying to read CSV in size of 109KB via python app using pandas

pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

plain read via Linux cat command raised Input/output error.

sudo cat /<somepath>/data.csv
cat: sudo cat /<somepath>/data.csv: Input/output error
Originally created by @mrsiano on GitHub (Sep 21, 2020). Original GitHub issue: https://github.com/s3fs-fuse/s3fs-fuse/issues/1416 ### Additional Information failed to read CSV from AWS S3 bucket mounted via (sf3s) #### Version of s3fs being used (s3fs --version) V1.87 #### Version of fuse being used (pkg-config --modversion fuse, rpm -qi fuse, dpkg -s fuse) 2.9.4 #### Kernel information (uname -r) 4.14.72-68.55.amzn1.x86_64 #### GNU/Linux Distribution, if applicable (cat /etc/os-release) NAME="Amazon Linux AMI" VERSION="2018.03" ID="amzn" ID_LIKE="rhel fedora" VERSION_ID="2018.03" PRETTY_NAME="Amazon Linux AMI 2018.03" ANSI_COLOR="0;33" CPE_NAME="cpe:/o:amazon:linux:2018.03:ga" HOME_URL="http://aws.amazon.com/amazon-linux-ami/" ### Details about issue ec2 host running docker app mounting s3 bucket via s3fs when trying to read CSV in size of 109KB via python app using pandas ``` pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'. ``` plain read via Linux cat command raised Input/output error. ``` sudo cat /<somepath>/data.csv cat: sudo cat /<somepath>/data.csv: Input/output error ```
kerem 2026-03-04 01:48:27 +03:00
  • closed this issue
  • added the
    need info
    label
Author
Owner

@mrsiano commented on GitHub (Sep 21, 2020):

@gaul @ggtakec ptal

<!-- gh-comment-id:696113006 --> @mrsiano commented on GitHub (Sep 21, 2020): @gaul @ggtakec ptal
Author
Owner

@gaul commented on GitHub (Sep 21, 2020):

Sorry there is not much to go on here. Can you share the debug logs via s3fs -f -d? Providing a complete test case will also help.

Note that if you are using pandas in Python, you might want to look at boto3 s3fs instead. It is a Python-only API and avoids some tradeoffs that FUSE introduces.

<!-- gh-comment-id:696121328 --> @gaul commented on GitHub (Sep 21, 2020): Sorry there is not much to go on here. Can you share the debug logs via `s3fs -f -d`? Providing a complete test case will also help. Note that if you are using pandas in Python, you might want to look at boto3 s3fs instead. It is a Python-only API and avoids some tradeoffs that FUSE introduces.
Author
Owner

@mrsiano commented on GitHub (Sep 21, 2020):

[ec2-user]$ sudo s3fs -f -d -o allow_other $S3_BUCKET /$S3_BUCKET
[CRT] sighandlers.cpp:SetLogLevel(168): change debug level from [CRT] to [INF]
[INF]     s3fs.cpp:set_mountpoint_attribute(4373): PROC(uid=0, gid=0) - MountPoint(uid=0, gid=0, mode=40755)
[INF] curl.cpp:InitMimeType(723): Loaded mime information from /etc/mime.types
[INF] fdcache.cpp:CheckCacheFileStatTopDir(134): The path to cache top dir is empty, thus not need to check permission.
[INF] s3fs.cpp:s3fs_init(3455): init v1.87(commit:a23d029) with OpenSSL
[INF] s3fs.cpp:s3fs_check_service(3800): check services.
[INF]       curl.cpp:CheckBucket(3527): check a bucket.
[INF]       curl.cpp:prepare_url(4831): URL is https://s3.amazonaws.com/<bucket>/
[INF]       curl.cpp:prepare_url(4864): URL changed is https://<bucket>/
[INF]       curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [] []
[INF]       curl.cpp:url_to_host(99): url is https://s3.amazonaws.com
[INF]       curl.cpp:RequestPerform(2520): HTTP response code 200
[INF] curl.cpp:ReturnHandler(341): Pool full: destroy the oldest handler
[ERR] s3fs.cpp:s3fs_init(3505): Failed to initialize signal object, but continue...
<!-- gh-comment-id:696255666 --> @mrsiano commented on GitHub (Sep 21, 2020): ``` [ec2-user]$ sudo s3fs -f -d -o allow_other $S3_BUCKET /$S3_BUCKET [CRT] sighandlers.cpp:SetLogLevel(168): change debug level from [CRT] to [INF] [INF] s3fs.cpp:set_mountpoint_attribute(4373): PROC(uid=0, gid=0) - MountPoint(uid=0, gid=0, mode=40755) [INF] curl.cpp:InitMimeType(723): Loaded mime information from /etc/mime.types [INF] fdcache.cpp:CheckCacheFileStatTopDir(134): The path to cache top dir is empty, thus not need to check permission. [INF] s3fs.cpp:s3fs_init(3455): init v1.87(commit:a23d029) with OpenSSL [INF] s3fs.cpp:s3fs_check_service(3800): check services. [INF] curl.cpp:CheckBucket(3527): check a bucket. [INF] curl.cpp:prepare_url(4831): URL is https://s3.amazonaws.com/<bucket>/ [INF] curl.cpp:prepare_url(4864): URL changed is https://<bucket>/ [INF] curl.cpp:insertV4Headers(2863): computing signature [GET] [/] [] [] [INF] curl.cpp:url_to_host(99): url is https://s3.amazonaws.com [INF] curl.cpp:RequestPerform(2520): HTTP response code 200 [INF] curl.cpp:ReturnHandler(341): Pool full: destroy the oldest handler [ERR] s3fs.cpp:s3fs_init(3505): Failed to initialize signal object, but continue... ```
Author
Owner

@mrsiano commented on GitHub (Sep 21, 2020):

@gaul here is a full trace when trying to cat a file via cli with -f -d flags:

[ec2-user]$ sudo cat /test/test/test_path/ssid=205/data.csv
[INF] s3fs.cpp:s3fs_getattr(765): [path=/test]
[INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path]
[INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path/ssid=205]
[INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path/ssid=205/data.csv]
[INF] s3fs.cpp:s3fs_open(2132): [path=/test/test_path/ssid=205/data.csv][flags=0x8000]
[INF] s3fs.cpp:s3fs_flush(2264): [path=/test/test_path/ssid=205/data.csv][fd=0]
[INF]       cache.cpp:DelStat(578): delete stat cache entry[path=/test/test_path/ssid=205/data.csv]
[INF]       curl.cpp:HeadRequest(2864): [tpath=/test/test_path/ssid=205/data.csv]
[INF]       curl.cpp:PreHeadRequest(2824): [tpath=/test/test_path/ssid=205/data.csv][bpath=][save=][sseckeypos=-1]
[INF]       curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv
[INF]       curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv
[INF]       curl.cpp:insertV4Headers(2556): computing signature [HEAD] [/test/test_path/ssid=205/data.csv] [] []
[INF]       curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com
[INF]       curl.cpp:RequestPerform(2225): HTTP response code 200
[INF]       cache.cpp:AddStat(369): add stat cache entry[path=/test/test_path/ssid=205/data.csv]
[INF]       fdcache_entity.cpp:SetMtime(663): [path=/test/test_path/ssid=205/data.csv][fd=5][time=1600413483]
[INF]       curl.cpp:GetObjectRequest(3186): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412]
[INF]       curl.cpp:PreGetObjectRequest(3134): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412]
[INF]       curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv
[INF]       curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv
[INF]       curl.cpp:GetObjectRequest(3205): downloading... [path=/test/test_path/ssid=205/data.csv][fd=5]
[INF]       curl.cpp:insertV4Headers(2556): computing signature [GET] [/test/test_path/ssid=205/data.csv] [] []
[INF]       curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com
[ERR] curl.cpp:RequestPerform(2245): HTTP response code 403, returning EPERM. Body Text:
[ERR] fdcache_entity.cpp:Read(1328): could not download. start(0), size(131072), errno(-1)
[WAN] s3fs.cpp:s3fs_read(2218): failed to read file(/test/test_path/ssid=205/data.csv). result=-5
[INF]       curl.cpp:GetObjectRequest(3186): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412]
[INF]       curl.cpp:PreGetObjectRequest(3134): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412]
[INF]       curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv
[INF]       curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv
[INF]       curl.cpp:GetObjectRequest(3205): downloading... [path=/test/test_path/ssid=205/data.csv][fd=5]
[INF]       curl.cpp:insertV4Headers(2556): computing signature [GET] [/test/test_path/ssid=205/data.csv] [] []
[INF]       curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com
[ERR] curl.cpp:RequestPerform(2245): HTTP response code 403, returning EPERM. Body Text:
[ERR] fdcache_entity.cpp:Read(1328): could not download. start(0), size(4096), errno(-1)
[WAN] s3fs.cpp:s3fs_read(2218): failed to read file(/test/test_path/ssid=205/data.csv). result=-5
cat: /test/test/test_path/ssid=205/data.csv: Input/output error
[INF] s3fs.cpp:s3fs_flush(2264): [path=/test/test_path/ssid=205/data.csv][fd=5]
[INF]       fdcache_entity.cpp:RowFlush(1098): [tpath=][path=/test/test_path/ssid=205/data.csv][fd=5]
[INF] s3fs.cpp:s3fs_release(2319): [path=/test/test_path/ssid=205/data.csv][fd=5]
[INF]       fdcache.cpp:GetFdEntity(409): [path=/test/test_path/ssid=205/data.csv][fd=5]
<!-- gh-comment-id:696270561 --> @mrsiano commented on GitHub (Sep 21, 2020): @gaul here is a full trace when trying to cat a file via cli with -f -d flags: ``` [ec2-user]$ sudo cat /test/test/test_path/ssid=205/data.csv [INF] s3fs.cpp:s3fs_getattr(765): [path=/test] [INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path] [INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path/ssid=205] [INF] s3fs.cpp:s3fs_getattr(765): [path=/test/test_path/ssid=205/data.csv] [INF] s3fs.cpp:s3fs_open(2132): [path=/test/test_path/ssid=205/data.csv][flags=0x8000] [INF] s3fs.cpp:s3fs_flush(2264): [path=/test/test_path/ssid=205/data.csv][fd=0] [INF] cache.cpp:DelStat(578): delete stat cache entry[path=/test/test_path/ssid=205/data.csv] [INF] curl.cpp:HeadRequest(2864): [tpath=/test/test_path/ssid=205/data.csv] [INF] curl.cpp:PreHeadRequest(2824): [tpath=/test/test_path/ssid=205/data.csv][bpath=][save=][sseckeypos=-1] [INF] curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv [INF] curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv [INF] curl.cpp:insertV4Headers(2556): computing signature [HEAD] [/test/test_path/ssid=205/data.csv] [] [] [INF] curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com [INF] curl.cpp:RequestPerform(2225): HTTP response code 200 [INF] cache.cpp:AddStat(369): add stat cache entry[path=/test/test_path/ssid=205/data.csv] [INF] fdcache_entity.cpp:SetMtime(663): [path=/test/test_path/ssid=205/data.csv][fd=5][time=1600413483] [INF] curl.cpp:GetObjectRequest(3186): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412] [INF] curl.cpp:PreGetObjectRequest(3134): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412] [INF] curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv [INF] curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv [INF] curl.cpp:GetObjectRequest(3205): downloading... [path=/test/test_path/ssid=205/data.csv][fd=5] [INF] curl.cpp:insertV4Headers(2556): computing signature [GET] [/test/test_path/ssid=205/data.csv] [] [] [INF] curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com [ERR] curl.cpp:RequestPerform(2245): HTTP response code 403, returning EPERM. Body Text: [ERR] fdcache_entity.cpp:Read(1328): could not download. start(0), size(131072), errno(-1) [WAN] s3fs.cpp:s3fs_read(2218): failed to read file(/test/test_path/ssid=205/data.csv). result=-5 [INF] curl.cpp:GetObjectRequest(3186): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412] [INF] curl.cpp:PreGetObjectRequest(3134): [tpath=/test/test_path/ssid=205/data.csv][start=0][size=6201412] [INF] curl_util.cpp:prepare_url(243): URL is https://s3.amazonaws.com/bucket/test/test_path/ssid%3D205/data.csv [INF] curl_util.cpp:prepare_url(276): URL changed is https://bucket.s3.amazonaws.com/test/test_path/ssid%3D205/data.csv [INF] curl.cpp:GetObjectRequest(3205): downloading... [path=/test/test_path/ssid=205/data.csv][fd=5] [INF] curl.cpp:insertV4Headers(2556): computing signature [GET] [/test/test_path/ssid=205/data.csv] [] [] [INF] curl_util.cpp:url_to_host(320): url is https://s3.amazonaws.com [ERR] curl.cpp:RequestPerform(2245): HTTP response code 403, returning EPERM. Body Text: [ERR] fdcache_entity.cpp:Read(1328): could not download. start(0), size(4096), errno(-1) [WAN] s3fs.cpp:s3fs_read(2218): failed to read file(/test/test_path/ssid=205/data.csv). result=-5 cat: /test/test/test_path/ssid=205/data.csv: Input/output error [INF] s3fs.cpp:s3fs_flush(2264): [path=/test/test_path/ssid=205/data.csv][fd=5] [INF] fdcache_entity.cpp:RowFlush(1098): [tpath=][path=/test/test_path/ssid=205/data.csv][fd=5] [INF] s3fs.cpp:s3fs_release(2319): [path=/test/test_path/ssid=205/data.csv][fd=5] [INF] fdcache.cpp:GetFdEntity(409): [path=/test/test_path/ssid=205/data.csv][fd=5] ```
Author
Owner

@qeliran commented on GitHub (Sep 22, 2020):

@mrsiano i can confirm this issue is fixed when downgraded to v1.86.
not much but at least a workaround for production environments

<!-- gh-comment-id:696600554 --> @qeliran commented on GitHub (Sep 22, 2020): @mrsiano i can confirm this issue is fixed when downgraded to v1.86. not much but at least a workaround for production environments
Author
Owner

@mrsiano commented on GitHub (Sep 24, 2020):

@qeliran thanks work like magic, the bug fixed already?

<!-- gh-comment-id:698189131 --> @mrsiano commented on GitHub (Sep 24, 2020): @qeliran thanks work like magic, the bug fixed already?
Author
Owner

@gaul commented on GitHub (Feb 8, 2021):

Please test with the latest version 1.88 and reopen if symptoms persist.

<!-- gh-comment-id:775165325 --> @gaul commented on GitHub (Feb 8, 2021): Please test with the latest version 1.88 and reopen if symptoms persist.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/s3fs-fuse#748
No description provided.