[GH-ISSUE #16] High memory usage when using local inference with llama2-7b model #15

New issue

Closed

opened 2026-03-03 13:52:12 +03:00 by kerem · 1 comment

kerem commented

2026-03-03 13:52:12 +03:00

Owner

Originally created by @fear-rush on GitHub (Jun 20, 2024).
Original GitHub issue: https://github.com/jehna/humanify/issues/16

Is there any workarounds that i can do to reduce the memory usage? For the context i'm using Macbook M1 Pro with 16GB memory and the memory usage when running local inference is too high. It uses around 34GB of memory. I tried to Convert model parameters to half-precision (float16) to reduce memory usage but still doesnt work.

Originally created by @fear-rush on GitHub (Jun 20, 2024). Original GitHub issue: https://github.com/jehna/humanify/issues/16 Is there any workarounds that i can do to reduce the memory usage? For the context i'm using Macbook M1 Pro with 16GB memory and the memory usage when running local inference is too high. It uses around 34GB of memory. I tried to Convert model parameters to half-precision (float16) to reduce memory usage but still doesnt work.

kerem closed this issue

2026-03-03 13:52:12 +03:00

kerem commented

2026-03-03 13:52:13 +03:00

Author

Owner

@jehna commented on GitHub (Aug 12, 2024):

@fear-rush The new v2 now has a much smaller default model that should run on 8gb RAM.

@jehna commented on GitHub (Aug 12, 2024): @fear-rush The new v2 now has a much smaller default model that should run on 8gb RAM.