pydantic_ai.settings
Bases: TypedDict
Settings to configure an LLM.
Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.
The maximum number of tokens to generate before stopping.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
- Bedrock
- MCP Sampling
- Outlines (all providers)
- xAI
Type: int
Amount of randomness injected into the response.
Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model’s
maximum temperature for creative and generative tasks.
Note that even with temperature of 0.0, the results will not be fully deterministic.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
- Bedrock
- Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
- xAI
Type: float
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
So 0.1 means only the tokens comprising the top 10% probability mass are considered.
You should either alter temperature or top_p, but not both.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Cohere
- Mistral
- Bedrock
- Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
- xAI
Type: float
Override the client-level default timeout for a request, in seconds.
Supported by:
- Gemini
- Anthropic
- OpenAI
- Groq
- Mistral
- xAI
Type: float | Timeout
Whether to allow parallel tool calls.
Supported by:
- OpenAI (some models, not o1)
- Groq
- Anthropic
- xAI
Type: bool
The random seed to use for the model, theoretically allowing for deterministic results.
Supported by:
- OpenAI
- Groq
- Cohere
- Mistral
- Gemini
- Outlines (LlamaCpp, VLLMOffline)
Type: int
Penalize new tokens based on whether they have appeared in the text so far.
Supported by:
- OpenAI
- Groq
- Cohere
- Gemini
- Mistral
- Outlines (LlamaCpp, SgLang, VLLMOffline)
- xAI
Type: float
Penalize new tokens based on their existing frequency in the text so far.
Supported by:
- OpenAI
- Groq
- Cohere
- Gemini
- Mistral
- Outlines (LlamaCpp, SgLang, VLLMOffline)
- xAI
Type: float
Modify the likelihood of specified tokens appearing in the completion.
Supported by:
- OpenAI
- Groq
- Outlines (Transformers, LlamaCpp, VLLMOffline)
Sequences that will cause the model to stop generating.
Supported by:
- OpenAI
- Anthropic
- Bedrock
- Mistral
- Groq
- Cohere
- xAI
Extra headers to send to the model.
Supported by:
- OpenAI
- Anthropic
- Gemini
- Groq
- xAI
Enable or configure thinking/reasoning for the model.
True: Enable thinking with the provider’s default effort level.False: Disable thinking (silently ignored if the model always thinks).'minimal'/'low'/'medium'/'high'/'xhigh': Enable thinking at a specific effort level.
When omitted, the model uses its default behavior (which may include thinking for reasoning models).
Provider-specific thinking settings (e.g., anthropic_thinking,
openai_reasoning_effort) take precedence over this unified field.
Supported by:
- Anthropic
- OpenAI
- Gemini
- Groq
- Bedrock
- OpenRouter
- Cerebras
- xAI
Type: ThinkingLevel
Extra body to send to the model.
Supported by:
- OpenAI
- Anthropic
- Groq
- Outlines (all providers)
Type: object