[GH-ISSUE #16] High memory usage when using local inference with llama2-7b model #15

Closed
opened 2026-03-03 13:52:12 +03:00 by kerem · 1 comment
Owner

Originally created by @fear-rush on GitHub (Jun 20, 2024).
Original GitHub issue: https://github.com/jehna/humanify/issues/16

Is there any workarounds that i can do to reduce the memory usage? For the context i'm using Macbook M1 Pro with 16GB memory and the memory usage when running local inference is too high. It uses around 34GB of memory. I tried to Convert model parameters to half-precision (float16) to reduce memory usage but still doesnt work.

Originally created by @fear-rush on GitHub (Jun 20, 2024). Original GitHub issue: https://github.com/jehna/humanify/issues/16 Is there any workarounds that i can do to reduce the memory usage? For the context i'm using Macbook M1 Pro with 16GB memory and the memory usage when running local inference is too high. It uses around 34GB of memory. I tried to Convert model parameters to half-precision (float16) to reduce memory usage but still doesnt work.
kerem closed this issue 2026-03-03 13:52:12 +03:00
Author
Owner

@jehna commented on GitHub (Aug 12, 2024):

@fear-rush The new v2 now has a much smaller default model that should run on 8gb RAM.

<!-- gh-comment-id:2284796032 --> @jehna commented on GitHub (Aug 12, 2024): @fear-rush The new v2 now has a much smaller default model that should run on 8gb RAM.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/humanify#15
No description provided.