Groq
Groq provider configuration for Meta's Llama models with ultra-fast inference.
Groq provides ultra-fast inference for Meta's Llama models, delivering high-throughput AI capabilities for demanding applications. Groq's specialized hardware accelerates model performance, enabling speeds of 400+ tokens/sec for real-time use cases.
Groq Llama model overview
Groq hosts Meta's latest Llama models, including the new Llama 4 family. These models excel at diverse NLP tasks, from summarization and reasoning to multilingual and multimodal applications, all powered by Groq's high-performance infrastructure.
Provider-specific features:
- Ultra-fast inference (400+ tokens/sec)
- Large context windows (128K tokens)
- Cost-effective pricing
- Specialized hardware acceleration
Supported Models
Groq offers 4 Llama models across different generations, each optimized for specific use cases.
Llama 4 Family
llama-4-scout-17b-16e-instruct
Status: Untested
API Name: meta-llama/llama-4-scout-17b-16e-instruct
Context Window: 128K tokens
Provider Docs: Groq Llama 4 Announcement
Description:
Meta's Llama 4 Scout model (17Bx16E) is ideal for summarization, reasoning, and code generation. Runs at 460+ tokens/sec on Groq's infrastructure.
Best For:
- Code generation and analysis
- Text summarization
- Multi-step reasoning tasks
- Real-time applications requiring high throughput
Notes:
Not yet validated on common Tambo tasks. This model is newly released as of November 2025. Use with caution and test in your specific context.
llama-4-maverick-17b-128e-instruct
Status: Untested
API Name: meta-llama/llama-4-maverick-17b-128e-instruct
Context Window: 128K tokens
Provider Docs: Groq Llama 4 Announcement
Description:
Meta's Llama 4 Maverick model (17Bx128E) is optimized for multilingual and multimodal tasks, making it ideal for assistants, chat applications, and creative use cases.
Best For:
- Multilingual applications
- Conversational AI and chatbots
- Creative writing and content generation
- Assistant and agent implementations
Notes:
Not yet validated on common Tambo tasks. This model is newly released as of November 2025 with enhanced multimodal capabilities. Use with caution and test in your specific context.
Llama 3.3 Family
llama-3.3-70b-versatile
Status: Untested
API Name: llama-3.3-70b-versatile
Context Window: 128K tokens
Provider Docs: Groq Llama 3.3 Documentation
Description:
Llama 3.3 70B Versatile is Meta's powerful multilingual model with 70B parameters, optimized for diverse NLP tasks and delivering strong performance across a wide range of applications.
Best For:
- Complex multilingual tasks
- General-purpose NLP applications
- Tasks requiring strong reasoning
- Production workloads requiring reliability
Notes:
Not yet validated on common Tambo tasks. With 70B parameters, this model offers strong capabilities but may have higher latency than smaller variants.
Llama 3.1 Family
llama-3.1-8b-instant
Status: Untested
API Name: llama-3.1-8b-instant
Context Window: 128K tokens
Provider Docs: Groq Llama 3.1 Documentation
Description:
Llama 3.1 8B on Groq delivers fast, high-quality responses for real-time tasks. Supports function calling, JSON output, and 128K context at low cost, making it ideal for cost-conscious applications.
Best For:
- Real-time chat applications
- Cost-sensitive production deployments
- Function calling and tool use
- JSON-structured outputs
- High-volume applications
Notes:
Not yet validated on common Tambo tasks. This is the most cost-effective option in Groq's lineup while maintaining good performance for well-defined tasks.
Configuration
Dashboard Setup
Configure Groq models through your project's LLM provider settings:
- Navigate to your project in the dashboard
- Go to Settings → LLM Providers
- Add or select Groq as your provider
- Enter your Groq API key
- Select your preferred Llama model
- Configure any custom LLM parameters as needed
- Click Save to apply the configuration
API Key
You'll need a Groq API key to use these models. Get one from Groq Console.
Model Selection
Choose your model based on your use case:
- Llama 4 Scout: Code generation, reasoning, summarization
- Llama 4 Maverick: Multilingual, multimodal, creative tasks
- Llama 3.3 70B: Complex tasks requiring strong reasoning
- Llama 3.1 8B: Cost-effective real-time applications
Performance Considerations
Speed vs. Size:
- Smaller models (8B) offer lower latency and cost
- Larger models (70B) provide better reasoning and accuracy
- Llama 4 models balance performance with specialized capabilities
Context Window:
All Groq Llama models support 128K token context windows, enabling long-form conversations and document analysis.
Throughput:
Groq's specialized hardware delivers exceptional inference speeds (400-460+ tokens/sec), making it ideal for real-time applications.
Best Practices
- Start with Llama 3.1 8B for cost-effective testing and simple tasks
- Use Llama 3.3 70B when you need stronger reasoning or complex understanding
- Try Llama 4 models for specialized tasks (Scout for code, Maverick for multilingual)
- Test thoroughly since all models are currently untested with Tambo
- Monitor costs and performance to find the right balance for your use case
Known Behaviors
Untested Models
All Groq Llama models are currently Untested in Tambo. They are newly released or recently added. Test them in your specific context before production deployment. See the Labels page for more information about model status labels.
Potential Streaming Issues
Streaming may behave inconsistently in models other than OpenAI. We're aware of the issue and are actively working on a fix. Please proceed with caution when using streaming on Groq models.
Troubleshooting
Slow response times?
- Groq is optimized for speed; if experiencing slowness, check your network connection
- Verify you're using the correct API endpoint
- Monitor your Groq account for rate limits
Unexpected outputs?
- Adjust temperature and other parameters in Custom LLM Parameters
- Try a different model size (8B vs. 70B) for your use case
- Provide more context in your prompts for better results
API errors?
- Verify your Groq API key is valid and not expired
- Check your account quota and rate limits
- Ensure the model name matches exactly as shown in Supported Models
See Also
- Labels - Understanding model status labels
- Custom LLM Parameters - Fine-tune model behavior
- Groq Console - Manage your API keys and usage
- Groq Documentation - Official Groq provider docs