mirror of
https://github.com/fsouza/fake-gcs-server.git
synced 2026-04-25 13:45:52 +03:00
[GH-ISSUE #397] Container won't start with very large file in Docker bind mount (Memory leak?) #77
Labels
No labels
bug
compatibility-issue
docker
documentation
enhancement
help wanted
needs information
pull-request
question
stale
unfortunate
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/fake-gcs-server#77
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ChrisWhealy on GitHub (Jan 21, 2021).
Original GitHub issue: https://github.com/fsouza/fake-gcs-server/issues/397
I need to use
fake-gcs-serverfor serving up some CSV files. Most of these files are pretty small (< 2Mb), but one is >6GbIf I start the container with
databind mounted to the directorydata/raw_csvcontaining 4 small CSV files, then everything is fine.However, if I add the very large CSV to the bind mount directory, then at start up, the container's memory usage just climbs until it hits the allocation limit and crashes.
At first if thought this was due to Docker's experimental feature "Use gRPC FUSE for file sharing" being switched on, but switching this off has no effect in case this.
Update
I’ve chopped the huge CSV file down to the first 58 million rows (from the original 170 million) to create a file that’s 2.06Gb.
With 16Gb allocated, the container now starts up successfully, but uses 12.4Gb of memory. Then when I fire a simple query to request the storage objects in bucket
raw_csv, the memory usages climbs to 15.2Gb@fsouza commented on GitHub (Jan 21, 2021):
What command are you running to start the container? Are you using the memory backend?
@ChrisWhealy commented on GitHub (Jan 21, 2021):
I'm starting the container with the command
docker run -d --memory=16g --name fake-gcs-server -p 4443:4443 -v ${PWD}/data:/data fsouza/fake-gcs-server:latestThe Docker VM is set to use 20Gb and "Use gRPC FuSE for file sharing" is switched off
@fsouza commented on GitHub (Oct 7, 2021):
This is probably not a memory leak, the code is just not very smart when loading stuff from the disk. There were some improvements to the list endpoint recently, but there's definitely room to improve other parts of the code.
@warent commented on GitHub (Nov 11, 2021):
I'm mounting an empty directory, so the bucket has no contents.
I had to force quit because my 16gb macbook pro isn't powerful enough for a single go program running in 1 docker container somehow? This is definitely not normal
@warent commented on GitHub (Nov 11, 2021):
I ran a profiler and here's the result.
docker run -v $(pwd)/buckets:/data -p 6060:6060 docker-compose_gcs --port 8080 --public-host gcs:8080 --backend filesystem --filesystem-root /data@warent commented on GitHub (Nov 11, 2021):
Wait a minute, I logged out what files it was walking, and it only found
.gitkeepas shown in the screenshot above... and it hangs. Then looked and saw thislol apparently it has been writing to .gitkeep this entire time and I have no idea what. If I look at the file it's just this monster

If you use a memory backend, this might propagate as some kind of memory leak? Using a filesystem backend, make sure you don't have any hidden files that are ballooning.
The bug here seems to be that fake-gcs-server is writing to something it shouldn't. Unless this is expected behavior. Maybe introducing some kind of a --ignore param or .gcsignore file would help with this.
Thanks for all your work on this @fsouza, it's an awesome project and has helped me with my work a lot!
@fsouza commented on GitHub (Nov 11, 2021):
Wait, is fake-gcs-server writing that file? That's unexpected 🤔
@warent commented on GitHub (Nov 12, 2021):
So I created an empty folder for the bucket and then I added a
.gitkeepfile to it so that the bucket structure would exist in github when empty. It was empty when I made it, and somewhere along the way fake-gcs-server populated it with all that stuff@gaul commented on GitHub (Feb 7, 2022):
Related to #669.