[GH-ISSUE #50] Issue running local model on a newly installed Windows 11 machine.

kerem commented

2026-03-03 13:52:18 +03:00

Owner

Originally created by @TheGreyRaven on GitHub (Aug 22, 2024).
Original GitHub issue: https://github.com/jehna/humanify/issues/50

Hey!
I have a freshly installed Windows 11 machine with Node 22 installed, when I try to run humanify local <file>.js I get the following crash:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA RTX A500 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
[node-llama-cpp] Using this model ("C:\Users\Raven\.humanifyjs\models\Phi-3.1-mini-4k-instruct-Q4_K_M.gguf") to tokenize text with special tokens and then detokenize it resulted in a different text. There might be an issue with the model or the tokenizer implementation. Using this model may not work as intended
ggml_vulkan: Device memory allocation of size 314576896 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896
[node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers
file:///C:/Users/Ravem/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461
                throw new Error("Failed to create context");
                      ^

Error: Failed to create context
    at LlamaContext._create (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461:23)
    at async Object.<anonymous> (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaModel/LlamaModel.js:274:24)
    at async withLock (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/lifecycle-utils/dist/withLock.js:36:16)
    at async LlamaModel.createContext (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaModel/LlamaModel.js:271:16)
    at async llama (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/dist/index.mjs:157:19)
    at async Command.<anonymous> (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/dist/index.mjs:56702:18)

Node.js v22.6.0

I have also downloaded the local model, any ideas what could be wrong?

Originally created by @TheGreyRaven on GitHub (Aug 22, 2024). Original GitHub issue: https://github.com/jehna/humanify/issues/50 Hey! I have a freshly installed Windows 11 machine with Node 22 installed, when I try to run ``humanify local <file>.js`` I get the following crash: ``` ggml_vulkan: Found 1 Vulkan devices: Vulkan0: NVIDIA RTX A500 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 [node-llama-cpp] Using this model ("C:\Users\Raven\.humanifyjs\models\Phi-3.1-mini-4k-instruct-Q4_K_M.gguf") to tokenize text with special tokens and then detokenize it resulted in a different text. There might be an issue with the model or the tokenizer implementation. Using this model may not work as intended ggml_vulkan: Device memory allocation of size 314576896 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896 [node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers file:///C:/Users/Ravem/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461 throw new Error("Failed to create context"); ^ Error: Failed to create context at LlamaContext._create (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461:23) at async Object.<anonymous> (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaModel/LlamaModel.js:274:24) at async withLock (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/lifecycle-utils/dist/withLock.js:36:16) at async LlamaModel.createContext (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaModel/LlamaModel.js:271:16) at async llama (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/dist/index.mjs:157:19) at async Command.<anonymous> (file:///C:/Users/Raven/AppData/Roaming/npm/node_modules/humanifyjs/dist/index.mjs:56702:18) Node.js v22.6.0 ``` I have also downloaded the local model, any ideas what could be wrong?

kerem added the

bug

label

2026-03-03 13:52:18 +03:00

kerem commented

2026-03-03 13:52:19 +03:00

Author

Owner

@jehna commented on GitHub (Aug 22, 2024):

How much GPU memory does your NVIDIA RTX A500 have? Humanify should run with 7gb GPU memory pretty well, but no guarantees for a system with less memory than that.

You can however use the --disableGpu flag to run the model on your CPU. This may be slower though.

@jehna commented on GitHub (Aug 22, 2024): How much GPU memory does your NVIDIA RTX A500 have? Humanify should run with 7gb GPU memory pretty well, but no guarantees for a system with less memory than that. You can however use the `--disableGpu` flag to run the model on your CPU. This may be slower though.

kerem commented

2026-03-03 13:52:19 +03:00

Author

Owner

@TheGreyRaven commented on GitHub (Aug 22, 2024):

It has 4GB of Video memory so that could be the issue, though even if I run humanify local --disableGpu <file> I still get the exact same crash:

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA RTX A500 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
ggml_vulkan: Device memory allocation of size 314576896 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896
[node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers
.......
.......
.......

@TheGreyRaven commented on GitHub (Aug 22, 2024): It has 4GB of Video memory so that could be the issue, though even if I run ``humanify local --disableGpu <file>`` I still get the exact same crash: ``` ggml_vulkan: Found 1 Vulkan devices: Vulkan0: NVIDIA RTX A500 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 ggml_vulkan: Device memory allocation of size 314576896 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896 [node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers ....... ....... ....... ```

kerem commented

2026-03-03 13:52:19 +03:00

Author

Owner

@jehna commented on GitHub (Aug 24, 2024):

Seems that you've found a bug! I'll create a patch for that, thank you for sending good debug info

@jehna commented on GitHub (Aug 24, 2024): Seems that you've found a bug! I'll create a patch for that, thank you for sending good debug info

kerem commented

2026-03-03 13:52:19 +03:00

Author

Owner

@0xdevalias commented on GitHub (Aug 26, 2024):

With the bugfix PR merged now, can this issue be closed?

#55
- #56

@0xdevalias commented on GitHub (Aug 26, 2024): With the bugfix PR merged now, can this issue be closed? - #55 - #56

kerem commented

2026-03-03 13:52:19 +03:00

Author

Owner

@TheGreyRaven commented on GitHub (Aug 26, 2024):

Glad to help out, I did a small workaround for now by running humanify through WSL with installed Nvidia drivers and there everything works great!

@TheGreyRaven commented on GitHub (Aug 26, 2024): Glad to help out, I did a small workaround for now by running humanify through WSL with installed Nvidia drivers and there everything works great!

kerem commented

2026-03-03 13:52:20 +03:00

Author

Owner

@TheGreyRaven commented on GitHub (Jan 21, 2025):

Hello again, so I have once again tried to run the project on my Windows 11 machine but it seems like the bug is still present.

ggml_vulkan: Compiling shaders...................................................Done!
ggml_vulkan: Device memory allocation of size 1610612736 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
[node-llama-cpp] llama_kv_cache_init: failed to allocate buffer for kv cache
[node-llama-cpp] llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
file:///C:/Users/oscar/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:612
                    throw new Error("Failed to create context");

Once again it runs fine in WSL

@TheGreyRaven commented on GitHub (Jan 21, 2025): Hello again, so I have once again tried to run the project on my Windows 11 machine but it seems like the bug is still present. ``` ggml_vulkan: Compiling shaders...................................................Done! ggml_vulkan: Device memory allocation of size 1610612736 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory [node-llama-cpp] llama_kv_cache_init: failed to allocate buffer for kv cache [node-llama-cpp] llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache file:///C:/Users/oscar/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:612 throw new Error("Failed to create context"); ``` Once again it runs fine in WSL

kerem commented

2026-03-03 13:52:20 +03:00

Author

Owner

@0xdevalias commented on GitHub (Mar 24, 2025):

it seems like the bug is still present.

@TheGreyRaven Can you confirm that when you say the bug is still present, what version of humanify are you running, and are you passing the --disableGpu flag as suggested earlier, etc?

I notice the last error was:

ggml_vulkan: Device memory allocation of size 314576896 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896
[node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers
file:///C:/Users/Ravem/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461
                throw new Error("Failed to create context");

Whereas this one is:

[node-llama-cpp] llama_kv_cache_init: failed to allocate buffer for kv cache
[node-llama-cpp] llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
file:///C:/Users/oscar/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:612
                    throw new Error("Failed to create context");

Not sure if the nuance between those two makes a difference, but figured I would note it just in case.

@0xdevalias commented on GitHub (Mar 24, 2025): > it seems like the bug is still present. @TheGreyRaven Can you confirm that when you say the bug is still present, what version of `humanify` are you running, and are you passing the `--disableGpu` flag as suggested earlier, etc? I notice the last error was: ``` ggml_vulkan: Device memory allocation of size 314576896 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate NVIDIA RTX A500 Laptop GPU buffer of size 314576896 [node-llama-cpp] llama_new_context_with_model: failed to allocate compute buffers file:///C:/Users/Ravem/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:461 throw new Error("Failed to create context"); ``` Whereas this one is: ``` [node-llama-cpp] llama_kv_cache_init: failed to allocate buffer for kv cache [node-llama-cpp] llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache file:///C:/Users/oscar/AppData/Roaming/npm/node_modules/humanifyjs/node_modules/node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:612 throw new Error("Failed to create context"); ``` Not sure if the nuance between those two makes a difference, but figured I would note it just in case.

kerem referenced this issue

2026-03-03 13:53:06 +03:00

[PR #25] [MERGED] Bump node-llama-cpp from 3.0.0-beta.42 to 3.0.0-beta.44 #106

Rows
Columns

[GH-ISSUE #50] Issue running local model on a newly installed Windows 11 machine. #25