mirror of
https://github.com/fsouza/fake-gcs-server.git
synced 2026-04-25 21:55:56 +03:00
[PR #556] [MERGED] Improve memory usage and execution time of listing objects with file system backend #712
Labels
No labels
bug
compatibility-issue
docker
documentation
enhancement
help wanted
needs information
pull-request
question
stale
unfortunate
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/fake-gcs-server#712
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/fsouza/fake-gcs-server/pull/556
Author: @ironsmile
Created: 8/13/2021
Status: ✅ Merged
Merged: 8/14/2021
Merged by: @fsouza
Base:
main← Head:do-not-store-all-files-in-memory-on-object-list📝 Commits (2)
3eb4422Do not store all objects in memory on list commandsf2340d3List objects: filter files by prefix before reading them📊 Changes
15 files changed (+519 additions, -304 deletions)
View changed files
📝
fakestorage/bucket_test.go(+14 -14)📝
fakestorage/example_test.go(+10 -6)📝
fakestorage/object.go(+98 -57)📝
fakestorage/object_test.go(+185 -100)📝
fakestorage/response.go(+8 -8)📝
fakestorage/server_test.go(+10 -10)📝
fakestorage/upload.go(+50 -40)📝
fakestorage/upload_test.go(+7 -5)📝
internal/backend/backend_test.go(+26 -5)📝
internal/backend/fs.go(+13 -5)📝
internal/backend/memory.go(+30 -10)📝
internal/backend/object.go(+11 -5)📝
internal/backend/storage.go(+1 -1)📝
main.go(+8 -6)📝
main_test.go(+48 -32)📄 Description
I am using the GCS Fake Server to develop locally and it is mostly great. But I've noticed it is completely unable to list the objects in my fie system bucket. Even when I give it a prefix which ensures only one object will match. It consumes all of the machine's memory and never finishes, presumably because it spends all of its time swapping. Information about my bucket: 53017 files with overall size of 20.3G. Sadly, the nature of my work is such that this is a relatively small data set.
So I went in started poking around the code. It quickly became evident that two things are happening:
This PR fixes those two issues in its two commits. Previously the list object command was taking all of my machine's 32GB of ram and was not finishing even after I've waited on it for half an hour. Now such list commands take no memory at all (in the range of few KB) and finish instantaneously.
While the above is great, I suspect there are many more places where the emulator will be significantly faster. I just haven't clocked them. On top of my head I see that deleting the bucket will require no memory where before it had the same problem as listing objects.
Further Improvements
It would be great if it is possible to read only the meta data for blobs stored on the file system. Unfortunately with the JSON encoding I don't see how that would be possible. As it stands one have to load all of the file contents in order for the JSON parser to do its thing. This is sad, though. When we consider that in many situations we would want to get only the blob meta-data.
I think the only way to achieve this cleanly would be to drop the JSON altogether and find another way of storing the meta-data. Possible approaches are file headers similarly to the nginx file cache or separate ".attrs" files like what gocloud.dev/blob does.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.