Skip to content

pydantic_ai.models.openai

Setup

For details on how to set up authentication with this model, see model configuration for OpenAI.

OpenAIChatModelSettings

Bases: ModelSettings

Settings used for an OpenAI model request.

Attributes

openai_reasoning_effort

Constrains effort on reasoning for reasoning models.

Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

Type: ReasoningEffort

openai_logprobs

Include log probabilities in the response.

For Chat models, these will be included in ModelResponse.provider_details['logprobs']. For Responses models, these will be included in the response output parts TextPart.provider_details['logprobs'].

Type: bool

openai_top_logprobs

Include log probabilities of the top n tokens in the response.

Type: int

openai_store

Whether or not to store the output of this request in OpenAI’s systems.

If False, OpenAI will not store the request for its own internal review or training. See OpenAI API reference.

When used with OpenAIResponsesModel, stored responses appear in OpenAI’s dashboard and can be referenced via openai_previous_response_id. Pair this with openai_previous_response_id='auto' to avoid storing duplicate copies of the conversation history across retries and subsequent requests within the same run.

Type: bool | None

openai_user

A unique identifier representing the end-user, which can help OpenAI monitor and detect abuse.

See OpenAI’s safety best practices for more details.

Type: str

openai_service_tier

The service tier to use for the model request.

Currently supported values are auto, default, flex, and priority. For more information, see OpenAI’s service tiers documentation.

Type: Literal[‘auto’, ‘default’, ‘flex’, ‘priority’]

openai_prediction

Enables predictive outputs.

This feature is currently only supported for some OpenAI models.

Type: ChatCompletionPredictionContentParam

openai_prompt_cache_key

Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.

See the OpenAI Prompt Caching documentation for more information.

Type: str

openai_prompt_cache_retention

The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours.

See the OpenAI Prompt Caching documentation for more information.

Type: Literal[‘in_memory’, ‘24h’]

openai_continuous_usage_stats

When True, enables continuous usage statistics in streaming responses.

When enabled, the API returns cumulative usage data with each chunk rather than only at the end. This setting correctly handles the cumulative nature of these stats by using only the final usage values rather than summing all intermediate values.

See OpenAI’s streaming documentation for more information.

Type: bool

OpenAIModelSettings

Bases: OpenAIChatModelSettings

Deprecated alias for OpenAIChatModelSettings.

OpenAIResponsesModelSettings

Bases: OpenAIChatModelSettings

Settings used for an OpenAI Responses model request.

ALL FIELDS MUST BE openai_ PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.

Attributes

openai_builtin_tools

The provided OpenAI built-in tools to use.

See OpenAI’s built-in tools for more details.

Type: Sequence[FileSearchToolParam | WebSearchToolParam | ComputerToolParam]

openai_reasoning_generate_summary

Deprecated alias for openai_reasoning_summary.

Type: Literal[‘detailed’, ‘concise’]

openai_reasoning_summary

A summary of the reasoning performed by the model.

This can be useful for debugging and understanding the model’s reasoning process. One of concise, detailed, or auto.

Check the OpenAI Reasoning documentation for more details.

Type: Literal[‘detailed’, ‘concise’, ‘auto’]

openai_send_reasoning_ids

Whether to send the unique IDs of reasoning, text, and function call parts from the message history to the model. Enabled by default for reasoning models.

This can result in errors like "Item 'rs_123' of type 'reasoning' was provided without its required following item." if the message history you’re sending does not match exactly what was received from the Responses API in a previous response, for example if you’re using a history processor. In that case, you’ll want to disable this.

Type: bool

openai_truncation

The truncation strategy to use for the model response.

It can be either:

  • disabled (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
  • auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.

Type: Literal[‘disabled’, ‘auto’]

openai_text_verbosity

Constrains the verbosity of the model’s text response.

Lower values will result in more concise responses, while higher values will result in more verbose responses. Currently supported values are low, medium, and high.

Type: Literal[‘low’, ‘medium’, ‘high’]

openai_previous_response_id

Reference a prior OpenAI response to continue a conversation server-side, omitting already-stored messages from the input.

  • 'auto': chain to the most recent provider_response_id in the message history. If the history contains no such response, no previous_response_id is sent.
  • A concrete response ID string: use it as the seed for the first request in the run (e.g. to continue from a prior turn). On subsequent in-run requests (retries, tool-call continuations), the most recent provider_response_id from the message history takes precedence so the chain extends correctly without re-sending messages that are already server-side.

In both cases, messages that precede the chosen response in the history are omitted from the input, since OpenAI reconstructs them from server-side state.

Requires the referenced response to have been stored (see openai_store, which defaults to True on OpenAI’s side). Not compatible with Zero Data Retention.

See the OpenAI Responses API documentation for more information.

Type: Literal[‘auto’] | str

openai_include_code_execution_outputs

Whether to include the code execution results in the response.

Corresponds to the code_interpreter_call.outputs value of the include parameter in the Responses API.

Type: bool

openai_include_web_search_sources

Whether to include the web search results in the response.

Corresponds to the web_search_call.action.sources value of the include parameter in the Responses API.

Type: bool

openai_include_file_search_results

Whether to include the file search results in the response.

Corresponds to the file_search_call.results value of the include parameter in the Responses API.

Type: bool

openai_include_raw_annotations

Whether to include the raw annotations in TextPart.provider_details.

When enabled, any annotations (e.g., citations from web search) will be available in the provider_details['annotations'] field of text parts. This is opt-in since there may be overlap with native annotation support once added via https://github.com/pydantic/pydantic-ai/issues/3126.

Type: bool

openai_context_management

Context management configuration for the request.

This enables OpenAI’s server-side automatic compaction inside the regular /responses call, as opposed to the standalone /responses/compact endpoint. See OpenAI’s compaction guide for details.

The OpenAICompaction capability sets this automatically in its default (stateful) mode.

Type: list[ContextManagement]

OpenAIChatModel

Bases: Model[AsyncOpenAI]

A model that uses the OpenAI API.

Internally, this uses the OpenAI Python client to interact with the API.

Apart from __init__, all methods are private or match those of the base class.

Attributes

model_name

The model name.

Type: OpenAIModelName

system

The model provider.

Type: str

profile

The model profile.

WebSearchTool is only supported if openai_chat_supports_web_search is True.

Type: ModelProfile

Methods

__init__
def __init__(
    model_name: OpenAIModelName,
    provider: OpenAIChatCompatibleProvider | Literal['openai', 'openai-chat', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None,
) -> None
def __init__(
    model_name: OpenAIModelName,
    provider: OpenAIChatCompatibleProvider | Literal['openai', 'openai-chat', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
    profile: ModelProfileSpec | None = None,
    system_prompt_role: OpenAISystemPromptRole | None = None,
    settings: ModelSettings | None = None,
) -> None

Initialize an OpenAI model.

Parameters

model_name : OpenAIModelName

The name of the OpenAI model to use. List of model names available here (Unfortunately, despite being ask to do so, OpenAI do not provide .inv files for their API).

provider : OpenAIChatCompatibleProvider | Literal[‘openai’, ‘openai-chat’, ‘gateway’] | Provider[AsyncOpenAI] Default: 'openai'

The provider to use. Defaults to 'openai'.

profile : ModelProfileSpec | None Default: None

The model profile to use. Defaults to a profile picked by the provider based on the model name.

system_prompt_role : OpenAISystemPromptRole | None Default: None

The role to use for the system prompt message. If not provided, defaults to 'system'. In the future, this may be inferred from the model name.

settings : ModelSettings | None Default: None

Default model settings for this model instance.

supported_builtin_tools

@classmethod

def supported_builtin_tools(cls) -> frozenset[type[AbstractBuiltinTool]]

Return the set of builtin tool types this model can handle.

Returns

frozenset[type[AbstractBuiltinTool]]

OpenAIModel

Bases: OpenAIChatModel

Deprecated alias for OpenAIChatModel.

OpenAIResponsesModel

Bases: Model[AsyncOpenAI]

A model that uses the OpenAI Responses API.

The OpenAI Responses API is the new API for OpenAI models.

If you are interested in the differences between the Responses API and the Chat Completions API, see the OpenAI API docs.

Attributes

model_name

The model name.

Type: OpenAIModelName

system

The model provider.

Type: str

Methods

__init__
def __init__(
    model_name: OpenAIModelName,
    provider: OpenAIResponsesCompatibleProvider | Literal['openai', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
    profile: ModelProfileSpec | None = None,
    settings: ModelSettings | None = None,
)

Initialize an OpenAI Responses model.

Parameters

model_name : OpenAIModelName

The name of the OpenAI model to use.

provider : OpenAIResponsesCompatibleProvider | Literal[‘openai’, ‘gateway’] | Provider[AsyncOpenAI] Default: 'openai'

The provider to use. Defaults to 'openai'.

profile : ModelProfileSpec | None Default: None

The model profile to use. Defaults to a profile picked by the provider based on the model name.

settings : ModelSettings | None Default: None

Default model settings for this model instance.

supported_builtin_tools

@classmethod

def supported_builtin_tools(cls) -> frozenset[type[AbstractBuiltinTool]]

Return the set of builtin tool types this model can handle.

Returns

frozenset[type[AbstractBuiltinTool]]

compact_messages

@async

def compact_messages(
    request_context: ModelRequestContext,
    instructions: str | None = None,
) -> ModelResponse

Compact messages using the OpenAI Responses compaction endpoint.

This calls OpenAI’s responses.compact API to produce an encrypted compaction that summarizes the conversation history. The returned ModelResponse contains a single CompactionPart that must be round-tripped in subsequent requests.

Returns

ModelResponse — A ModelResponse with a single CompactionPart containing the encrypted compaction data.

Parameters

request_context : ModelRequestContext

The model request context containing messages, settings, and parameters.

instructions : str | None Default: None

Optional custom instructions for the compaction summarization. If provided, these override the agent-level instructions.

OpenAIStreamedResponse

Bases: StreamedResponse

Implementation of StreamedResponse for OpenAI models.

Attributes

model_name

Get the model name of the response.

Type: OpenAIModelName

provider_name

Get the provider name.

Type: str

provider_url

Get the provider base URL.

Type: str

timestamp

Get the timestamp of the response.

Type: datetime

OpenAIResponsesStreamedResponse

Bases: StreamedResponse

Implementation of StreamedResponse for OpenAI Responses API.

Attributes

model_name

Get the model name of the response.

Type: OpenAIModelName

provider_name

Get the provider name.

Type: str

provider_url

Get the provider base URL.

Type: str

timestamp

Get the timestamp of the response.

Type: datetime

OpenAICompaction

Bases: AbstractCapability[AgentDepsT]

Compaction capability for OpenAI Responses API.

Automatically compacts conversation history to keep long-running agent runs within manageable context limits. Two modes are supported, selected by the stateless flag:

  • Stateful mode (default, stateless=False): configures OpenAI’s server-side auto-compaction via the context_management field on the regular /responses request. The server triggers compaction when input tokens cross a threshold, and the compacted item is returned alongside the normal response. Compatible with openai_previous_response_id='auto' and server-side conversation state.

    Configurable with token_threshold (compact_threshold on the API). If omitted, OpenAI picks a server-side default.

  • Stateless mode (stateless=True): calls the stateless /responses/compact endpoint from a before_model_request hook when your trigger condition is met. Use this in ZDR environments where OpenAI must not retain conversation data, when you set openai_store=False, or when you need explicit out-of-band control over when compaction runs.

    Requires either message_count_threshold or a custom trigger callable.

If stateless is not set, it is inferred from which parameters you provide: passing any stateless-only parameter (message_count_threshold or trigger) implies stateless=True; otherwise stateful mode is used.

Example usage::

from pydantic_ai import Agent from pydantic_ai.models.openai import OpenAICompaction

Stateful mode with OpenAI’s server-side default threshold:

agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction()], )

Stateful mode with a custom token threshold:

agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction(token_threshold=100_000)], )

Stateless mode for ZDR environments or explicit control:

agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction(message_count_threshold=20)], )

Methods

__init__
def __init__(
    stateless: bool | None = None,
    token_threshold: int | None = None,
    message_count_threshold: int | None = None,
    trigger: Callable[[list[ModelMessage]], bool] | None = None,
    instructions: str | None = None,
) -> None

Initialize the OpenAI compaction capability.

Returns

None

Parameters

stateless : bool | None Default: None

Select the compaction mode explicitly. If None (the default), the mode is inferred from the other parameters: passing any stateless-only parameter (message_count_threshold or trigger) implies stateless=True; otherwise stateful mode is used.

token_threshold : int | None Default: None

Stateful-mode only. Input token threshold at which OpenAI’s server-side compaction is triggered. Corresponds to compact_threshold in the context_management API field. If None, OpenAI picks a server-side default.

message_count_threshold : int | None Default: None

Stateless-mode only. Compact when the message count exceeds this threshold.

trigger : Callable[[list[ModelMessage]], bool] | None Default: None

Stateless-mode only. Custom callable that decides whether to compact based on the current messages. Takes precedence over message_count_threshold.

instructions : str | None Default: None

Deprecated. OpenAI’s /compact endpoint treats instructions as a system/developer message inserted into the compaction model’s context, not as a directive for how to summarize the conversation. This does not match AnthropicCompaction.instructions semantics, so the field is deprecated and will be removed in a future version.

DEPRECATED_OPENAI_MODELS

Models that are deprecated or don’t exist but are still present in the OpenAI SDK’s type definitions.

Type: frozenset[str] Default: frozenset({'chatgpt-4o-latest', 'codex-mini-latest', 'gpt-4-0125-preview', 'gpt-4-1106-preview', 'gpt-4-turbo-preview', 'gpt-4-32k', 'gpt-4-32k-0314', 'gpt-4-32k-0613', 'gpt-4-vision-preview', 'gpt-4o-audio-preview-2024-10-01', 'gpt-5.1-mini', 'o1-mini', 'o1-mini-2024-09-12', 'o1-preview', 'o1-preview-2024-09-12'})

OpenAIModelName

Possible OpenAI model names.

Since OpenAI supports a variety of date-stamped models, we explicitly list the latest models but allow any name in the type hints. See the OpenAI docs for a full list.

Using this more broad type for the model name instead of the ChatModel definition allows this model to be used more easily with other model types (ie, Ollama, Deepseek).

Default: str | AllModels

MCP_SERVER_TOOL_CONNECTOR_URI_SCHEME

Prefix for OpenAI connector IDs. OpenAI supports either a URL or a connector ID when passing MCP configuration to a model, by using that prefix like x-openai-connector:<connector-id> in a URL, you can pass a connector ID to a model.

Type: Literal[‘x-openai-connector’] Default: 'x-openai-connector'