Loading...

Cerebras

Cerebras provider configuration for ultra-fast inference with Wafer-Scale Engine hardware.

Cerebras provides extremely fast inference powered by their Wafer-Scale Engine hardware. This page covers all Cerebras models available in Tambo.

Cerebras model overview

Cerebras hosts open-source models optimized for their specialized hardware:

  • Production Models: Llama 3.1 8B and GPT-OSS 120B
  • Preview Models: Qwen 3 235B and ZAI GLM 4.7

All context windows are limited to 8,192 tokens on the free tier.

Supported Models

Production Models

llama3.1-8b

Status: Untested API Name: llama3.1-8b Context Window: 128,000 tokens (8,192 on free tier) Speed: ~2,200 tokens/sec Provider Docs: Cerebras - Llama 3.1 8B

Meta's Llama 3.1 8B model running on Cerebras hardware. Best suited for simple tasks and cost-effective deployments.

Best for:

  • Cost-sensitive applications
  • Real-time chat and responses
  • Simple text generation tasks
  • High-volume, low-complexity workloads

gpt-oss-120b

Status: Untested API Name: gpt-oss-120b Context Window: 8,192 tokens Speed: ~3,000 tokens/sec Provider Docs: Cerebras - OpenAI GPT OSS

OpenAI open-weight 120B parameter model on Cerebras.

Best for:

  • General-purpose high-performance tasks
  • Applications requiring strong reasoning
  • Complex content generation

Preview Models

Preview Models

Preview models are intended for evaluation purposes only and may be discontinued with short notice.

qwen-3-235b-a22b-instruct-2507

Status: Untested API Name: qwen-3-235b-a22b-instruct-2507 Context Window: 32,768 tokens (8,192 on free tier) Speed: ~1,400 tokens/sec Provider Docs: Cerebras - Qwen 3 235B

Alibaba's large-scale Qwen 3 model (235B params) optimized for instruction following.

Best for:

  • Complex instruction following
  • Advanced reasoning and analysis
  • Multilingual applications

zai-glm-4.7

Status: Untested API Name: zai-glm-4.7 Context Window: 128,000 tokens (8,192 on free tier) Speed: ~1,000 tokens/sec Provider Docs: Cerebras - ZAI GLM 4.7

Zhipu AI's GLM 4.7 (355B params) on Cerebras.

Best for:

  • General text generation
  • Conversational AI
  • Bilingual (Chinese/English) tasks

Untested Models

All Cerebras models are currently Untested in Tambo. Test them in your specific context before production deployment. See the Labels page for more information about model status labels.

Configuration

All Cerebras models are configured through your project settings in the Tambo dashboard:

  1. Navigate to your project in the dashboard
  2. Go to Settings -> LLM Providers
  3. Select Cerebras as your provider
  4. Enter your Cerebras API key (get one from Cerebras Cloud)
  5. Choose your desired model from the dropdown
  6. Configure any additional parameters
  7. Click Save to apply the configuration

Model Selection Guide

For speed and cost efficiency:

For complex tasks:

For bilingual (Chinese/English):

See Also

  • Labels - Understanding model status labels and observed behaviors
  • Custom LLM Parameters - Configuring model parameters for fine-tuned responses