# Groq
URL: /models/groq
Groq provides ultra-fast inference for Meta's Llama models, delivering high-throughput AI capabilities for demanding applications. Groq's specialized hardware accelerates model performance, enabling speeds of 400+ tokens/sec for real-time use cases.
## Groq Llama model overview
Groq hosts Meta's latest Llama models, including the new Llama 4 family. These models excel at diverse NLP tasks, from summarization and reasoning to multilingual and multimodal applications, all powered by Groq's high-performance infrastructure.
**Provider-specific features:**
* Ultra-fast inference (400+ tokens/sec)
* Large context windows (128K tokens)
* Cost-effective pricing
* Specialized hardware acceleration
## Supported Models
Groq offers 4 Llama models across different generations, each optimized for specific use cases.
### Llama 4 Family
#### llama-4-scout-17b-16e-instruct
**Status:** Untested
**API Name:** `meta-llama/llama-4-scout-17b-16e-instruct`
**Context Window:** 128K tokens
**Provider Docs:** [Groq Llama 4 Announcement](https://groq.com/blog/llama-4-now-live-on-groq-build-fast-at-the-lowest-cost-without-compromise)
**Description:**
Meta's Llama 4 Scout model (17Bx16E) is ideal for summarization, reasoning, and code generation. Runs at 460+ tokens/sec on Groq's infrastructure.
**Best For:**
* Code generation and analysis
* Text summarization
* Multi-step reasoning tasks
* Real-time applications requiring high throughput
**Notes:**
Not yet validated on common Tambo tasks. This model is newly released as of November 2025. Use with caution and test in your specific context.
#### llama-4-maverick-17b-128e-instruct
**Status:** Untested
**API Name:** `meta-llama/llama-4-maverick-17b-128e-instruct`
**Context Window:** 128K tokens
**Provider Docs:** [Groq Llama 4 Announcement](https://groq.com/blog/llama-4-now-live-on-groq-build-fast-at-the-lowest-cost-without-compromise)
**Description:**
Meta's Llama 4 Maverick model (17Bx128E) is optimized for multilingual and multimodal tasks, making it ideal for assistants, chat applications, and creative use cases.
**Best For:**
* Multilingual applications
* Conversational AI and chatbots
* Creative writing and content generation
* Assistant and agent implementations
**Notes:**
Not yet validated on common Tambo tasks. This model is newly released as of November 2025 with enhanced multimodal capabilities. Use with caution and test in your specific context.
### Llama 3.3 Family
#### llama-3.3-70b-versatile
**Status:** Untested
**API Name:** `llama-3.3-70b-versatile`
**Context Window:** 128K tokens
**Provider Docs:** [Groq Llama 3.3 Documentation](https://console.groq.com/docs/model/llama-3.3-70b-versatile)
**Description:**
Llama 3.3 70B Versatile is Meta's powerful multilingual model with 70B parameters, optimized for diverse NLP tasks and delivering strong performance across a wide range of applications.
**Best For:**
* Complex multilingual tasks
* General-purpose NLP applications
* Tasks requiring strong reasoning
* Production workloads requiring reliability
**Notes:**
Not yet validated on common Tambo tasks. With 70B parameters, this model offers strong capabilities but may have higher latency than smaller variants.
### Llama 3.1 Family
#### llama-3.1-8b-instant
**Status:** Untested
**API Name:** `llama-3.1-8b-instant`
**Context Window:** 128K tokens
**Provider Docs:** [Groq Llama 3.1 Documentation](https://console.groq.com/docs/model/llama-3.1-8b-instant)
**Description:**
Llama 3.1 8B on Groq delivers fast, high-quality responses for real-time tasks. Supports function calling, JSON output, and 128K context at low cost, making it ideal for cost-conscious applications.
**Best For:**
* Real-time chat applications
* Cost-sensitive production deployments
* Function calling and tool use
* JSON-structured outputs
* High-volume applications
**Notes:**
Not yet validated on common Tambo tasks. This is the most cost-effective option in Groq's lineup while maintaining good performance for well-defined tasks.
## Configuration
### Dashboard Setup
Configure Groq models through your project's LLM provider settings:
1. Navigate to your project in the dashboard
2. Go to **Settings** → **LLM Providers**
3. Add or select **Groq** as your provider
4. Enter your [Groq API key](#api-key)
5. Select your preferred Llama model
6. Configure any [custom LLM parameters](/models/custom-llm-parameters) as needed
7. Click **Save** to apply the configuration
### API Key
You'll need a Groq API key to use these models. Get one from [Groq Console](https://console.groq.com/).
### Model Selection
Choose your model based on your use case:
* [**Llama 4 Scout**](#llama-4-scout-17b-16e-instruct): Code generation, reasoning, summarization
* [**Llama 4 Maverick**](#llama-4-maverick-17b-128e-instruct): Multilingual, multimodal, creative tasks
* [**Llama 3.3 70B**](#llama-3-3-70b-versatile): Complex tasks requiring strong reasoning
* [**Llama 3.1 8B**](#llama-3-1-8b-instant): Cost-effective real-time applications
## Performance Considerations
**Speed vs. Size:**
* Smaller models ([8B](#llama-3-1-8b-instant)) offer lower latency and cost
* Larger models ([70B](#llama-3-3-70b-versatile)) provide better reasoning and accuracy
* [Llama 4 models](#llama-4-family) balance performance with specialized capabilities
**Context Window:**
All Groq Llama models support 128K token context windows, enabling long-form conversations and document analysis.
**Throughput:**
Groq's specialized hardware delivers exceptional inference speeds (400-460+ tokens/sec), making it ideal for real-time applications.
## Best Practices
* **Start with [Llama 3.1 8B](#llama-3-1-8b-instant)** for cost-effective testing and simple tasks
* **Use [Llama 3.3 70B](#llama-3-3-70b-versatile)** when you need stronger reasoning or complex understanding
* **Try [Llama 4 models](#llama-4-family)** for specialized tasks ([Scout](#llama-4-scout-17b-16e-instruct) for code, [Maverick](#llama-4-maverick-17b-128e-instruct) for multilingual)
* **Test thoroughly** since all models are currently [untested with Tambo](#known-behaviors)
* **Monitor costs** and performance to find the right balance for your use case
## Known Behaviors
All Groq Llama models are currently **Untested** in Tambo. They are newly
released or recently added. Test them in your specific context before
production deployment. See the [Labels](/models/labels) page for more
information about model status labels.
Streaming may behave inconsistently in models other than OpenAI. We're aware
of the issue and are actively working on a fix. Please proceed with caution
when using streaming on Groq models.
## Troubleshooting
**Slow response times?**
* Groq is optimized for speed; if experiencing slowness, check your network connection
* Verify you're using the correct API endpoint
* Monitor your Groq account for rate limits
**Unexpected outputs?**
* Adjust temperature and other parameters in [Custom LLM Parameters](/models/custom-llm-parameters)
* Try a different model size ([8B](#llama-3-1-8b-instant) vs. [70B](#llama-3-3-70b-versatile)) for your use case
* Provide more context in your prompts for better results
**API errors?**
* Verify your [Groq API key](#api-key) is valid and not expired
* Check your account quota and rate limits
* Ensure the model name matches exactly as shown in [Supported Models](#supported-models)
## See Also
* [Labels](/models/labels) - Understanding model status labels
* [Custom LLM Parameters](/models/custom-llm-parameters) - Fine-tune model behavior
* [Groq Console](https://console.groq.com/) - Manage your API keys and usage
* [Groq Documentation](https://console.groq.com/docs) - Official Groq provider docs