Tambo Docs

Cerebras provider configuration for ultra-fast inference with Wafer-Scale Engine hardware.

Cerebras provides extremely fast inference powered by their Wafer-Scale Engine hardware. This page covers all Cerebras models available in Tambo.

Cerebras model overview

Cerebras hosts open-source models optimized for their specialized hardware:

Production Models: Llama 3.1 8B and GPT-OSS 120B
Preview Models: Qwen 3 235B and ZAI GLM 4.7

All context windows are limited to 8,192 tokens on the free tier.

Supported Models

Production Models

llama3.1-8b

Status: Untested API Name: llama3.1-8b Context Window: 128,000 tokens (8,192 on free tier) Speed: ~2,200 tokens/sec Provider Docs: Cerebras - Llama 3.1 8B

Meta's Llama 3.1 8B model running on Cerebras hardware. Best suited for simple tasks and cost-effective deployments.

Best for:

Cost-sensitive applications
Real-time chat and responses
Simple text generation tasks
High-volume, low-complexity workloads

gpt-oss-120b

Status: Untested API Name: gpt-oss-120b Context Window: 8,192 tokens Speed: ~3,000 tokens/sec Provider Docs: Cerebras - OpenAI GPT OSS

OpenAI open-weight 120B parameter model on Cerebras.

Best for:

General-purpose high-performance tasks
Applications requiring strong reasoning
Complex content generation

Preview Models

Preview models are intended for evaluation purposes only and may be discontinued with short notice.

qwen-3-235b-a22b-instruct-2507

Status: Untested API Name: qwen-3-235b-a22b-instruct-2507 Context Window: 32,768 tokens (8,192 on free tier) Speed: ~1,400 tokens/sec Provider Docs: Cerebras - Qwen 3 235B

Alibaba's large-scale Qwen 3 model (235B params) optimized for instruction following.

Best for:

Complex instruction following
Advanced reasoning and analysis
Multilingual applications

zai-glm-4.7

Status: Untested API Name: zai-glm-4.7 Context Window: 128,000 tokens (8,192 on free tier) Speed: ~1,000 tokens/sec Provider Docs: Cerebras - ZAI GLM 4.7

Zhipu AI's GLM 4.7 (355B params) on Cerebras.

Best for:

General text generation
Conversational AI
Bilingual (Chinese/English) tasks

Untested Models

All Cerebras models are currently Untested in Tambo. Test them in your specific context before production deployment. See the Labels page for more information about model status labels.

Configuration

All Cerebras models are configured through your project settings in the Tambo dashboard:

Navigate to your project in the dashboard
Go to Settings -> LLM Providers
Select Cerebras as your provider
Enter your Cerebras API key (get one from Cerebras Cloud)
Choose your desired model from the dropdown
Configure any additional parameters
Click Save to apply the configuration

Model Selection Guide

For speed and cost efficiency:

Fastest: Llama 3.1 8B

For complex tasks:

Maximum capability: Qwen 3 235B or GPT-OSS 120B

For bilingual (Chinese/English):

GLM 4.7

Cerebras

Cerebras model overview

Supported Models

Production Models

llama3.1-8b

gpt-oss-120b

Preview Models

qwen-3-235b-a22b-instruct-2507

zai-glm-4.7

Configuration

Model Selection Guide

See Also

On this page