Skip to content

pydantic_ai.settings

ModelSettings

Bases: TypedDict

Settings to configure an LLM.

Here we include only settings which apply to multiple models / model providers, though not all of these settings are supported by all models.

Attributes

max_tokens

The maximum number of tokens to generate before stopping.

Supported by:

  • Gemini
  • Anthropic
  • OpenAI
  • Groq
  • Cohere
  • Mistral
  • Bedrock
  • MCP Sampling
  • Outlines (all providers)
  • xAI

Type: int

temperature

Amount of randomness injected into the response.

Use temperature closer to 0.0 for analytical / multiple choice, and closer to a model’s maximum temperature for creative and generative tasks.

Note that even with temperature of 0.0, the results will not be fully deterministic.

Supported by:

  • Gemini
  • Anthropic
  • OpenAI
  • Groq
  • Cohere
  • Mistral
  • Bedrock
  • Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
  • xAI

Type: float

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

So 0.1 means only the tokens comprising the top 10% probability mass are considered.

You should either alter temperature or top_p, but not both.

Supported by:

  • Gemini
  • Anthropic
  • OpenAI
  • Groq
  • Cohere
  • Mistral
  • Bedrock
  • Outlines (Transformers, LlamaCpp, SgLang, VLLMOffline)
  • xAI

Type: float

timeout

Override the client-level default timeout for a request, in seconds.

Supported by:

  • Gemini
  • Anthropic
  • OpenAI
  • Groq
  • Mistral
  • xAI

Type: float | Timeout

parallel_tool_calls

Whether to allow parallel tool calls.

Supported by:

  • OpenAI (some models, not o1)
  • Groq
  • Anthropic
  • xAI

Type: bool

seed

The random seed to use for the model, theoretically allowing for deterministic results.

Supported by:

  • OpenAI
  • Groq
  • Cohere
  • Mistral
  • Gemini
  • Outlines (LlamaCpp, VLLMOffline)

Type: int

presence_penalty

Penalize new tokens based on whether they have appeared in the text so far.

Supported by:

  • OpenAI
  • Groq
  • Cohere
  • Gemini
  • Mistral
  • Outlines (LlamaCpp, SgLang, VLLMOffline)
  • xAI

Type: float

frequency_penalty

Penalize new tokens based on their existing frequency in the text so far.

Supported by:

  • OpenAI
  • Groq
  • Cohere
  • Gemini
  • Mistral
  • Outlines (LlamaCpp, SgLang, VLLMOffline)
  • xAI

Type: float

logit_bias

Modify the likelihood of specified tokens appearing in the completion.

Supported by:

  • OpenAI
  • Groq
  • Outlines (Transformers, LlamaCpp, VLLMOffline)

Type: dict[str, int]

stop_sequences

Sequences that will cause the model to stop generating.

Supported by:

  • OpenAI
  • Anthropic
  • Bedrock
  • Mistral
  • Groq
  • Cohere
  • Google
  • xAI

Type: list[str]

extra_headers

Extra headers to send to the model.

Supported by:

  • OpenAI
  • Anthropic
  • Gemini
  • Groq
  • xAI

Type: dict[str, str]

thinking

Enable or configure thinking/reasoning for the model.

  • True: Enable thinking with the provider’s default effort level.
  • False: Disable thinking (silently ignored if the model always thinks).
  • 'minimal'/'low'/'medium'/'high'/'xhigh': Enable thinking at a specific effort level.

When omitted, the model uses its default behavior (which may include thinking for reasoning models).

Provider-specific thinking settings (e.g., anthropic_thinking, openai_reasoning_effort) take precedence over this unified field.

Supported by:

  • Anthropic
  • OpenAI
  • Gemini
  • Groq
  • Bedrock
  • OpenRouter
  • Cerebras
  • xAI

Type: ThinkingLevel

extra_body

Extra body to send to the model.

Supported by:

  • OpenAI
  • Anthropic
  • Groq
  • Outlines (all providers)

Type: object