[GH-ISSUE #502] Cerebras inference support #82

Open
opened 2026-03-03 13:52:51 +03:00 by kerem · 1 comment
Owner

Originally created by @neoOpus on GitHub (Jun 29, 2025).
Original GitHub issue: https://github.com/jehna/humanify/issues/502

Hi,

I would like to know if someone worked on making humanify support Cerebras inference, as it is compatible with OpenAI and can be a better alternative in terms of speed and cost?

https://inference-docs.cerebras.ai/resources/openai

Originally created by @neoOpus on GitHub (Jun 29, 2025). Original GitHub issue: https://github.com/jehna/humanify/issues/502 Hi, I would like to know if someone worked on making humanify support Cerebras inference, as it is compatible with OpenAI and can be a better alternative in terms of speed and cost? https://inference-docs.cerebras.ai/resources/openai
Author
Owner

@0xdevalias commented on GitHub (Jun 30, 2025):

as it is compatible with OpenAI

@neoOpus Have you tried using the humanify openai --baseURL param in the way they suggest?

github.com/jehna/humanify@7beba2d324/src/commands/openai.ts (L20-L24)

I'd be interested to hear if you manage to get it to work, and also your feedback on the speed differences, how effective the different models are when used with humanify, etc.


It seems it's also usable via OpenRouter:

These seem to be the models currently available:

  • https://inference-docs.cerebras.ai/introduction
    • The Cerebras Inference API currently provides access to the following models:

      Model Name Model ID Parameters Speed (tokens/s)
      Llama 4 Scout llama-4-scout-17b-16e-instruct 109 billion ~2600 tokens/s
      Llama 3.1 8B llama3.1-8b 8 billion ~2200 tokens/s
      Llama 3.3 70B llama-3.3-70b 70 billion ~2100 tokens/s
      Qwen 3 32B* qwen-3-32b 32 billion ~2100 tokens/s
      DeepSeek R1 Distill Llama 70B* deepseek-r1-distill-llama-70b 70 billion ~1700 tokens/s

The pricing:

And the rate limits:

And further docs about tool use/function calling:


See Also:

<!-- gh-comment-id:3017635871 --> @0xdevalias commented on GitHub (Jun 30, 2025): > as it is compatible with OpenAI @neoOpus Have you tried using the `humanify openai --baseURL` param in the way they suggest? - https://inference-docs.cerebras.ai/resources/openai#configuring-openai-to-use-cerebras-api - > Configuring OpenAI to Use Cerebras API https://github.com/jehna/humanify/blob/7beba2d32433e58bb77d0e1b0eda01c470fec3e2/src/commands/openai.ts#L20-L24 I'd be interested to hear if you manage to get it to work, and also your feedback on the speed differences, how effective the different models are when used with `humanify`, etc. --- It seems it's also usable via OpenRouter: - https://github.com/jehna/humanify/issues/416 - https://inference-docs.cerebras.ai/resources/openrouter-cerebras - https://openrouter.ai/provider/cerebras These seem to be the models currently available: - https://inference-docs.cerebras.ai/introduction - > The Cerebras Inference API currently provides access to the following models: > > | Model Name | Model ID | Parameters | Speed (tokens/s) | > |:---|:---|:---|:---| > | Llama 4 Scout | `llama-4-scout-17b-16e-instruct` | 109 billion | ~2600 tokens/s | > | Llama 3.1 8B | `llama3.1-8b` | 8 billion | ~2200 tokens/s | > | Llama 3.3 70B | `llama-3.3-70b` | 70 billion | ~2100 tokens/s | > | Qwen 3 32B\* | `qwen-3-32b` | 32 billion | ~2100 tokens/s | > | DeepSeek R1 Distill Llama 70B\* | `deepseek-r1-distill-llama-70b` | 70 billion | ~1700 tokens/s | The pricing: - https://inference-docs.cerebras.ai/support/pricing - > Pricing - > Our free tier supports a context length of 8,192 tokens. For all supported models, we also offer context lengths up to 128K upon request. - https://inference-docs.cerebras.ai/support/pricing#exploration-tier-pricing - > | Model | Speed | Input | Output | > |:---|:---|:---|:---| > | Llama 4 Scout | ~2600 tokens/s | \$0.65/M tokens | \$0.85/M tokens | > | Llama 3.1 8B | ~2200 tokens/s | \$0.10/M tokens | \$0.10/M tokens | > | Llama 3.3 70B | ~2100 tokens/s | \$0.85/M tokens | \$1.20/M tokens | > | Qwen 3 32B | ~2100 tokens/s | \$0.40/M tokens | \$0.80/M tokens | > | Deepseek R1 Distill Llama 70B | ~1700 tokens/s | \$2.20/M tokens | \$2.50/M tokens | And the rate limits: - https://inference-docs.cerebras.ai/support/rate-limits - > Rate Limits And further docs about tool use/function calling: - https://inference-docs.cerebras.ai/capabilities/tool-use - > Tool Use - https://inference-docs.cerebras.ai/agent-bootcamp/section-2 - > Tool Use and Function Calling --- See Also: - https://github.com/jehna/humanify/issues/400 - https://github.com/jehna/humanify/issues/84
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/humanify#82
No description provided.