Cerebras
Cerebras provider configuration for ultra-fast inference with Wafer-Scale Engine hardware.
Cerebras provides extremely fast inference (2,000+ tokens/second) powered by their Wafer-Scale Engine hardware. This page covers all Cerebras models available in Tambo, including Llama, Qwen, and other open-weight model families.
Cerebras model overview
Cerebras hosts a variety of open-source models optimized for their specialized hardware. Available model families include:
- Llama Family: Meta's Llama 3.1 and 3.3 models with up to 128K context
- Qwen Family: Alibaba's Qwen 3 models for multilingual and structured outputs
- Other Models: GPT-OSS and GLM models for various use cases
All Cerebras models deliver ultra-fast inference at 2,000+ tokens/sec. Context windows are limited to 8,192 tokens on the free tier.
Supported Models
Llama Family
llama3.1-8b
Status: Untested
API Name: llama3.1-8b
Context Window: 128,000 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Meta's Llama 3.1 8B model running on Cerebras hardware. Ideal for fast inference at 2,000+ tokens/sec, best suited for simple tasks and cost-effective deployments.
Best for:
- Cost-sensitive applications
- Real-time chat and responses
- Simple text generation tasks
- High-volume, low-complexity workloads
Notes:
- Not yet validated on common Tambo tasks
- Smallest and fastest Llama model on Cerebras
- Good starting point for testing
llama-3.3-70b
Status: Untested
API Name: llama-3.3-70b
Context Window: 128,000 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Meta's Llama 3.3 70B model on Cerebras, offering balanced performance with ultra-fast inference. Suitable for complex reasoning and multi-step tasks.
Best for:
- Complex reasoning tasks
- Multi-step problem solving
- Production workloads requiring higher accuracy
- Tasks requiring stronger language understanding
Notes:
- Not yet validated on common Tambo tasks
- Stronger capabilities than the 8B variant
- Maintains fast inference speeds
Qwen Family
qwen-3-32b
Status: Untested
API Name: qwen-3-32b
Context Window: 32,768 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Alibaba's Qwen 3 32B model with hybrid reasoning capabilities on Cerebras. Good for multilingual tasks and structured outputs.
Best for:
- Multilingual applications
- Structured output generation
- Hybrid reasoning tasks
- JSON and code generation
Notes:
- Not yet validated on common Tambo tasks
- Excels at multilingual tasks and structured outputs
- Good balance of capability and speed
qwen-3-235b-a22b-instruct-2507
Status: Untested
API Name: qwen-3-235b-a22b-instruct-2507
Context Window: 32,768 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Alibaba's large-scale Qwen 3 model (235B params, A22B architecture) optimized for instruction following on Cerebras.
Best for:
- Complex instruction following
- Tasks requiring maximum model capability
- Advanced reasoning and analysis
- Professional content generation
Notes:
- Not yet validated on common Tambo tasks
- One of the largest models available on Cerebras
- Best for demanding tasks requiring maximum capability
Other Models
gpt-oss-120b
Status: Untested
API Name: gpt-oss-120b
Context Window: 8,192 tokens
Provider Docs: Cerebras Inference Docs
OpenAI open-weight 120B parameter model on Cerebras. Powerful capabilities for demanding applications with Cerebras's fast inference.
Best for:
- General-purpose high-performance tasks
- Applications requiring strong reasoning
- Complex content generation
- Tasks benefiting from larger model scale
Notes:
- Not yet validated on common Tambo tasks
- Based on OpenAI's open-weight release
- Strong general-purpose capabilities
zai-glm-4.6
Status: Untested
API Name: zai-glm-4.6
Context Window: 128,000 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Zhipu AI's GLM 4.6 model on Cerebras with fast inference capabilities.
Best for:
- General text generation
- Conversational AI
- Content creation
- Bilingual (Chinese/English) tasks
Notes:
- Not yet validated on common Tambo tasks
- Strong bilingual capabilities
- Good for Chinese/English applications
zai-glm-4.7
Status: Untested
API Name: zai-glm-4.7
Context Window: 128,000 tokens (8,192 on free tier)
Provider Docs: Cerebras Inference Docs
Zhipu AI's GLM 4.7 model on Cerebras, the latest iteration with improved capabilities.
Best for:
- General text generation
- Conversational AI
- Content creation
- Bilingual (Chinese/English) tasks
Notes:
- Not yet validated on common Tambo tasks
- Latest GLM iteration with improvements over 4.6
- Strong bilingual capabilities
Untested Models
All Cerebras models are currently Untested in Tambo. They are newly added to the platform. Test them in your specific context before production deployment. See the Labels page for more information about model status labels.
Configuration
All Cerebras models are configured through your project settings in the Tambo dashboard:
- Navigate to your project in the dashboard
- Go to Settings → LLM Providers
- Select Cerebras as your provider
- Enter your Cerebras API key (get one from Cerebras Cloud)
- Choose your desired model from the dropdown
- Configure any additional parameters (temperature, maxOutputTokens, etc.)
- Click Save to apply the configuration
Cerebras models support the standard LLM parameters available in Tambo. For detailed parameter configuration, see Custom LLM Parameters.
Model Selection Guide
For speed and cost efficiency:
- Fastest: Llama 3.1 8B
- Balanced: Llama 3.3 70B or Qwen 3 32B
For complex tasks:
- Maximum capability: Qwen 3 235B or GPT-OSS 120B
- Multilingual: Qwen 3 32B or Qwen 3 235B
For bilingual (Chinese/English):
See Also
- Labels - Understanding model status labels and observed behaviors
- Custom LLM Parameters - Configuring model parameters for fine-tuned responses