Loading...

Groq

Groq provider configuration for Meta's Llama models with ultra-fast inference.

Groq provides ultra-fast inference for Meta's Llama models, delivering high-throughput AI capabilities for demanding applications. Groq's specialized hardware accelerates model performance, enabling speeds of 400+ tokens/sec for real-time use cases.

Groq Llama model overview

Groq hosts Meta's latest Llama models, including the new Llama 4 family. These models excel at diverse NLP tasks, from summarization and reasoning to multilingual and multimodal applications, all powered by Groq's high-performance infrastructure.

Provider-specific features:

  • Ultra-fast inference (400+ tokens/sec)
  • Large context windows (128K tokens)
  • Cost-effective pricing
  • Specialized hardware acceleration

Supported Models

Groq offers 4 Llama models across different generations, each optimized for specific use cases.

Llama 4 Family

llama-4-scout-17b-16e-instruct

Status: Untested

API Name: meta-llama/llama-4-scout-17b-16e-instruct

Context Window: 128K tokens

Provider Docs: Groq Llama 4 Announcement

Description:

Meta's Llama 4 Scout model (17Bx16E) is ideal for summarization, reasoning, and code generation. Runs at 460+ tokens/sec on Groq's infrastructure.

Best For:

  • Code generation and analysis
  • Text summarization
  • Multi-step reasoning tasks
  • Real-time applications requiring high throughput

Notes:

Not yet validated on common Tambo tasks. This model is newly released as of November 2025. Use with caution and test in your specific context.

llama-4-maverick-17b-128e-instruct

Status: Untested

API Name: meta-llama/llama-4-maverick-17b-128e-instruct

Context Window: 128K tokens

Provider Docs: Groq Llama 4 Announcement

Description:

Meta's Llama 4 Maverick model (17Bx128E) is optimized for multilingual and multimodal tasks, making it ideal for assistants, chat applications, and creative use cases.

Best For:

  • Multilingual applications
  • Conversational AI and chatbots
  • Creative writing and content generation
  • Assistant and agent implementations

Notes:

Not yet validated on common Tambo tasks. This model is newly released as of November 2025 with enhanced multimodal capabilities. Use with caution and test in your specific context.

Llama 3.3 Family

llama-3.3-70b-versatile

Status: Untested

API Name: llama-3.3-70b-versatile

Context Window: 128K tokens

Provider Docs: Groq Llama 3.3 Documentation

Description:

Llama 3.3 70B Versatile is Meta's powerful multilingual model with 70B parameters, optimized for diverse NLP tasks and delivering strong performance across a wide range of applications.

Best For:

  • Complex multilingual tasks
  • General-purpose NLP applications
  • Tasks requiring strong reasoning
  • Production workloads requiring reliability

Notes:

Not yet validated on common Tambo tasks. With 70B parameters, this model offers strong capabilities but may have higher latency than smaller variants.

Llama 3.1 Family

llama-3.1-8b-instant

Status: Untested

API Name: llama-3.1-8b-instant

Context Window: 128K tokens

Provider Docs: Groq Llama 3.1 Documentation

Description:

Llama 3.1 8B on Groq delivers fast, high-quality responses for real-time tasks. Supports function calling, JSON output, and 128K context at low cost, making it ideal for cost-conscious applications.

Best For:

  • Real-time chat applications
  • Cost-sensitive production deployments
  • Function calling and tool use
  • JSON-structured outputs
  • High-volume applications

Notes:

Not yet validated on common Tambo tasks. This is the most cost-effective option in Groq's lineup while maintaining good performance for well-defined tasks.

Configuration

Dashboard Setup

Configure Groq models through your project's LLM provider settings:

  1. Navigate to your project in the dashboard
  2. Go to SettingsLLM Providers
  3. Add or select Groq as your provider
  4. Enter your Groq API key
  5. Select your preferred Llama model
  6. Configure any custom LLM parameters as needed
  7. Click Save to apply the configuration

API Key

You'll need a Groq API key to use these models. Get one from Groq Console.

Model Selection

Choose your model based on your use case:

Performance Considerations

Speed vs. Size:

  • Smaller models (8B) offer lower latency and cost
  • Larger models (70B) provide better reasoning and accuracy
  • Llama 4 models balance performance with specialized capabilities

Context Window:

All Groq Llama models support 128K token context windows, enabling long-form conversations and document analysis.

Throughput:

Groq's specialized hardware delivers exceptional inference speeds (400-460+ tokens/sec), making it ideal for real-time applications.

Best Practices

  • Start with Llama 3.1 8B for cost-effective testing and simple tasks
  • Use Llama 3.3 70B when you need stronger reasoning or complex understanding
  • Try Llama 4 models for specialized tasks (Scout for code, Maverick for multilingual)
  • Test thoroughly since all models are currently untested with Tambo
  • Monitor costs and performance to find the right balance for your use case

Known Behaviors

Untested Models

All Groq Llama models are currently Untested in Tambo. They are newly released or recently added. Test them in your specific context before production deployment. See the Labels page for more information about model status labels.

Potential Streaming Issues

Streaming may behave inconsistently in models other than OpenAI. We're aware of the issue and are actively working on a fix. Please proceed with caution when using streaming on Groq models.

Troubleshooting

Slow response times?

  • Groq is optimized for speed; if experiencing slowness, check your network connection
  • Verify you're using the correct API endpoint
  • Monitor your Groq account for rate limits

Unexpected outputs?

  • Adjust temperature and other parameters in Custom LLM Parameters
  • Try a different model size (8B vs. 70B) for your use case
  • Provide more context in your prompts for better results

API errors?

  • Verify your Groq API key is valid and not expired
  • Check your account quota and rate limits
  • Ensure the model name matches exactly as shown in Supported Models

See Also