[GH-ISSUE #1882] Writing to bucket with spark #224

Open
opened 2026-03-03 12:09:17 +03:00 by kerem · 2 comments
Owner

Originally created by @vitstransky on GitHub (Jan 17, 2025).
Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/1882

Hello,

I am trying to run spark with fake-gcs-server docker. My spark config is like this:

.config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
.config("spark.hadoop.fs.gs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
.config("spark.hadoop.fs.gs.auth.service.account.enable", "false")
.config("spark.hadoop.fs.gs.auth.null.enable", "true")
.config("fs.gs.storage.root.url", "http://localhost:4443/")

I am able to read file, which is already in the bucket with:

spark.read
      .option("header", "true")
      .csv("gs://sample-bucket/some_file.csv")

But when I try to write with data.write.save("gs://sample-bucket/test"), empty folder is created in bucket and it fails with attached error. Could somebody help me to solve it? Or is it possible to write to local data folder and automatically load it to bucket, since now I have to restart docker to see the changes.

Thank you in advance.

Error.txt

Originally created by @vitstransky on GitHub (Jan 17, 2025). Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/1882 Hello, I am trying to run spark with fake-gcs-server docker. My spark config is like this: ``` .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem") .config("spark.hadoop.fs.gs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") .config("spark.hadoop.fs.gs.auth.service.account.enable", "false") .config("spark.hadoop.fs.gs.auth.null.enable", "true") .config("fs.gs.storage.root.url", "http://localhost:4443/") ``` I am able to read file, which is already in the bucket with: ``` spark.read .option("header", "true") .csv("gs://sample-bucket/some_file.csv") ``` But when I try to write with `data.write.save("gs://sample-bucket/test")`, empty folder is created in bucket and it fails with attached error. Could somebody help me to solve it? Or is it possible to write to local data folder and automatically load it to bucket, since now I have to restart docker to see the changes. Thank you in advance. [Error.txt](https://github.com/user-attachments/files/18453937/Error.txt)
Author
Owner

@fsouza commented on GitHub (Jan 17, 2025):

Can you share logs from the server too?

<!-- gh-comment-id:2598225797 --> @fsouza commented on GitHub (Jan 17, 2025): Can you share logs from the server too?
Author
Owner

@vitstransky commented on GitHub (Jan 17, 2025):

Logs.txt

<!-- gh-comment-id:2598244873 --> @vitstransky commented on GitHub (Jan 17, 2025): [Logs.txt](https://github.com/user-attachments/files/18454202/Logs.txt)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/fake-gcs-server#224
No description provided.