pydantic_ai.settings

ModelSettings

Bases: TypedDict

Settings to configure an LLM.

Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.

Attributes

max_tokens

The maximum number of tokens to generate before stopping.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
MCP Sampling
Outlines (all providers)
xAI

Type: int

temperature

Amount of randomness injected into the response.

Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model’s maximum temperature for creative and generative tasks.

Note that even with temperature of 0.0, the results will not be fully deterministic.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
xAI

Type: float

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

So 0.1 means only the tokens comprising the top 10% probability mass are considered.

You should either alter temperature or top_p, but not both.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Cohere
Mistral
Bedrock
Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
xAI

Type: float

timeout

Override the client-level default timeout for a request, in seconds.

Supported by:

Gemini
Anthropic
OpenAI
Groq
Mistral
xAI

Type: float | Timeout

parallel_tool_calls

Whether to allow parallel tool calls.

Supported by:

OpenAI (some models, not o1)
Groq
Anthropic
xAI

Type: bool

seed

The random seed to use for the model, theoretically allowing for deterministic results.

Supported by:

OpenAI
Groq
Cohere
Mistral
Gemini
Outlines (LlamaCpp, VLLMOffline)

Type: int

presence_penalty

Penalize new tokens based on whether they have appeared in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral
Outlines (LlamaCpp, SgLang, VLLMOffline)
xAI

Type: float

frequency_penalty

Penalize new tokens based on their existing frequency in the text so far.

Supported by:

OpenAI
Groq
Cohere
Gemini
Mistral
Outlines (LlamaCpp, SgLang, VLLMOffline)
xAI

Type: float

logit_bias

Modify the likelihood of specified tokens appearing in the completion.

Supported by:

OpenAI
Groq
Outlines (Transformers, LlamaCpp, VLLMOffline)

Type: dict[str, int]

stop_sequences

Sequences that will cause the model to stop generating.

Supported by:

OpenAI
Anthropic
Bedrock
Mistral
Groq
Cohere
Google
xAI

Type: list[str]

extra_headers

Extra headers to send to the model.

Supported by:

OpenAI
Anthropic
Gemini
Groq
xAI

Type: dict[str, str]

thinking

Enable or configure thinking/reasoning for the model.

True: Enable thinking with the provider’s default effort level.
False: Disable thinking (silently ignored if the model always thinks).
'minimal'/'low'/'medium'/'high'/'xhigh': Enable thinking at a specific effort level.

When omitted, the model uses its default behavior (which may include thinking for reasoning models).

Provider-specific thinking settings (e.g., anthropic_thinking, openai_reasoning_effort) take precedence over this unified field.

Supported by:

Anthropic
OpenAI
Gemini
Groq
Bedrock
OpenRouter
Cerebras
xAI

Type: ThinkingLevel

extra_body

Extra body to send to the model.

Supported by:

OpenAI
Anthropic
Groq
Outlines (all providers)

Type: object