pydantic_ai.models.openai
For details on how to set up authentication with this model, see model configuration for OpenAI.
Bases: ModelSettings
Settings used for an OpenAI model request.
Constrains effort on reasoning for reasoning models.
Currently supported values are low, medium, and high. Reducing reasoning effort can
result in faster responses and fewer tokens used on reasoning in a response.
Type: ReasoningEffort
Include log probabilities in the response.
For Chat models, these will be included in ModelResponse.provider_details['logprobs'].
For Responses models, these will be included in the response output parts TextPart.provider_details['logprobs'].
Type: bool
Include log probabilities of the top n tokens in the response.
Type: int
Whether or not to store the output of this request in OpenAI’s systems.
If False, OpenAI will not store the request for its own internal review or training.
See OpenAI API reference.
When used with OpenAIResponsesModel, stored responses appear in OpenAI’s dashboard and
can be referenced via openai_previous_response_id.
Pair this with openai_previous_response_id='auto' to avoid storing duplicate copies of
the conversation history across retries and subsequent requests within the same run.
A unique identifier representing the end-user, which can help OpenAI monitor and detect abuse.
See OpenAI’s safety best practices for more details.
Type: str
The service tier to use for the model request.
Currently supported values are auto, default, flex, and priority.
For more information, see OpenAI’s service tiers documentation.
Type: Literal[‘auto’, ‘default’, ‘flex’, ‘priority’]
Enables predictive outputs.
This feature is currently only supported for some OpenAI models.
Type: ChatCompletionPredictionContentParam
Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
See the OpenAI Prompt Caching documentation for more information.
Type: str
The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours.
See the OpenAI Prompt Caching documentation for more information.
Type: Literal[‘in_memory’, ‘24h’]
When True, enables continuous usage statistics in streaming responses.
When enabled, the API returns cumulative usage data with each chunk rather than only at the end. This setting correctly handles the cumulative nature of these stats by using only the final usage values rather than summing all intermediate values.
See OpenAI’s streaming documentation for more information.
Type: bool
Bases: OpenAIChatModelSettings
Deprecated alias for OpenAIChatModelSettings.
Bases: OpenAIChatModelSettings
Settings used for an OpenAI Responses model request.
ALL FIELDS MUST BE openai_ PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.
The provided OpenAI built-in tools to use.
See OpenAI’s built-in tools for more details.
Type: Sequence[FileSearchToolParam | WebSearchToolParam | ComputerToolParam]
Deprecated alias for openai_reasoning_summary.
Type: Literal[‘detailed’, ‘concise’]
A summary of the reasoning performed by the model.
This can be useful for debugging and understanding the model’s reasoning process.
One of concise, detailed, or auto.
Check the OpenAI Reasoning documentation for more details.
Type: Literal[‘detailed’, ‘concise’, ‘auto’]
Whether to send the unique IDs of reasoning, text, and function call parts from the message history to the model. Enabled by default for reasoning models.
This can result in errors like "Item 'rs_123' of type 'reasoning' was provided without its required following item."
if the message history you’re sending does not match exactly what was received from the Responses API in a previous response,
for example if you’re using a history processor.
In that case, you’ll want to disable this.
Type: bool
The truncation strategy to use for the model response.
It can be either:
disabled(default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.auto: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
Type: Literal[‘disabled’, ‘auto’]
Constrains the verbosity of the model’s text response.
Lower values will result in more concise responses, while higher values will
result in more verbose responses. Currently supported values are low,
medium, and high.
Type: Literal[‘low’, ‘medium’, ‘high’]
Reference a prior OpenAI response to continue a conversation server-side, omitting already-stored messages from the input.
'auto': chain to the most recentprovider_response_idin the message history. If the history contains no such response, noprevious_response_idis sent.- A concrete response ID string: use it as the seed for the first request in the run
(e.g. to continue from a prior turn). On subsequent in-run requests (retries,
tool-call continuations), the most recent
provider_response_idfrom the message history takes precedence so the chain extends correctly without re-sending messages that are already server-side.
In both cases, messages that precede the chosen response in the history are omitted from the input, since OpenAI reconstructs them from server-side state.
Requires the referenced response to have been stored (see
openai_store,
which defaults to True on OpenAI’s side). Not compatible with Zero Data Retention.
See the OpenAI Responses API documentation for more information.
Whether to include the code execution results in the response.
Corresponds to the code_interpreter_call.outputs value of the include parameter in the Responses API.
Type: bool
Whether to include the web search results in the response.
Corresponds to the web_search_call.action.sources value of the include parameter in the Responses API.
Type: bool
Whether to include the file search results in the response.
Corresponds to the file_search_call.results value of the include parameter in the Responses API.
Type: bool
Whether to include the raw annotations in TextPart.provider_details.
When enabled, any annotations (e.g., citations from web search) will be available
in the provider_details['annotations'] field of text parts.
This is opt-in since there may be overlap with native annotation support once
added via https://github.com/pydantic/pydantic-ai/issues/3126.
Type: bool
Context management configuration for the request.
This enables OpenAI’s server-side automatic compaction inside the regular
/responses call, as opposed to the standalone /responses/compact endpoint.
See OpenAI’s compaction guide
for details.
The OpenAICompaction capability
sets this automatically in its default (stateful) mode.
Type: list[ContextManagement]
Bases: Model[AsyncOpenAI]
A model that uses the OpenAI API.
Internally, this uses the OpenAI Python client to interact with the API.
Apart from __init__, all methods are private or match those of the base class.
The model name.
Type: OpenAIModelName
The model provider.
Type: str
The model profile.
WebSearchTool is only supported if openai_chat_supports_web_search is True.
Type: ModelProfile
def __init__(
model_name: OpenAIModelName,
provider: OpenAIChatCompatibleProvider | Literal['openai', 'openai-chat', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
profile: ModelProfileSpec | None = None,
settings: ModelSettings | None = None,
) -> None
def __init__(
model_name: OpenAIModelName,
provider: OpenAIChatCompatibleProvider | Literal['openai', 'openai-chat', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
profile: ModelProfileSpec | None = None,
system_prompt_role: OpenAISystemPromptRole | None = None,
settings: ModelSettings | None = None,
) -> None
Initialize an OpenAI model.
The name of the OpenAI model to use. List of model names available
here
(Unfortunately, despite being ask to do so, OpenAI do not provide .inv files for their API).
provider : OpenAIChatCompatibleProvider | Literal[‘openai’, ‘openai-chat’, ‘gateway’] | Provider[AsyncOpenAI] Default: 'openai'
The provider to use. Defaults to 'openai'.
profile : ModelProfileSpec | None Default: None
The model profile to use. Defaults to a profile picked by the provider based on the model name.
system_prompt_role : OpenAISystemPromptRole | None Default: None
The role to use for the system prompt message. If not provided, defaults to 'system'.
In the future, this may be inferred from the model name.
settings : ModelSettings | None Default: None
Default model settings for this model instance.
@classmethod
def supported_builtin_tools(cls) -> frozenset[type[AbstractBuiltinTool]]
Return the set of builtin tool types this model can handle.
frozenset[type[AbstractBuiltinTool]]
Bases: OpenAIChatModel
Deprecated alias for OpenAIChatModel.
Bases: Model[AsyncOpenAI]
A model that uses the OpenAI Responses API.
The OpenAI Responses API is the new API for OpenAI models.
If you are interested in the differences between the Responses API and the Chat Completions API, see the OpenAI API docs.
The model name.
Type: OpenAIModelName
The model provider.
Type: str
def __init__(
model_name: OpenAIModelName,
provider: OpenAIResponsesCompatibleProvider | Literal['openai', 'gateway'] | Provider[AsyncOpenAI] = 'openai',
profile: ModelProfileSpec | None = None,
settings: ModelSettings | None = None,
)
Initialize an OpenAI Responses model.
The name of the OpenAI model to use.
provider : OpenAIResponsesCompatibleProvider | Literal[‘openai’, ‘gateway’] | Provider[AsyncOpenAI] Default: 'openai'
The provider to use. Defaults to 'openai'.
profile : ModelProfileSpec | None Default: None
The model profile to use. Defaults to a profile picked by the provider based on the model name.
settings : ModelSettings | None Default: None
Default model settings for this model instance.
@classmethod
def supported_builtin_tools(cls) -> frozenset[type[AbstractBuiltinTool]]
Return the set of builtin tool types this model can handle.
frozenset[type[AbstractBuiltinTool]]
@async
def compact_messages(
request_context: ModelRequestContext,
instructions: str | None = None,
) -> ModelResponse
Compact messages using the OpenAI Responses compaction endpoint.
This calls OpenAI’s responses.compact API to produce an encrypted compaction
that summarizes the conversation history. The returned ModelResponse contains
a single CompactionPart that must be round-tripped in subsequent requests.
ModelResponse — A ModelResponse with a single CompactionPart containing the encrypted compaction data.
The model request context containing messages, settings, and parameters.
Optional custom instructions for the compaction summarization. If provided, these override the agent-level instructions.
Bases: StreamedResponse
Implementation of StreamedResponse for OpenAI models.
Get the model name of the response.
Type: OpenAIModelName
Get the provider name.
Type: str
Get the provider base URL.
Type: str
Get the timestamp of the response.
Type: datetime
Bases: StreamedResponse
Implementation of StreamedResponse for OpenAI Responses API.
Get the model name of the response.
Type: OpenAIModelName
Get the provider name.
Type: str
Get the provider base URL.
Type: str
Get the timestamp of the response.
Type: datetime
Bases: AbstractCapability[AgentDepsT]
Compaction capability for OpenAI Responses API.
Automatically compacts conversation history to keep long-running agent
runs within manageable context limits. Two modes are supported, selected
by the stateless flag:
-
Stateful mode (default,
stateless=False): configures OpenAI’s server-side auto-compaction via thecontext_managementfield on the regular/responsesrequest. The server triggers compaction when input tokens cross a threshold, and the compacted item is returned alongside the normal response. Compatible withopenai_previous_response_id='auto'and server-side conversation state.Configurable with
token_threshold(compact_thresholdon the API). If omitted, OpenAI picks a server-side default. -
Stateless mode (
stateless=True): calls the stateless/responses/compactendpoint from abefore_model_requesthook when your trigger condition is met. Use this in ZDR environments where OpenAI must not retain conversation data, when you setopenai_store=False, or when you need explicit out-of-band control over when compaction runs.Requires either
message_count_thresholdor a customtriggercallable.
If stateless is not set, it is inferred from which parameters you
provide: passing any stateless-only parameter (message_count_threshold
or trigger) implies stateless=True; otherwise stateful mode is used.
Example usage::
from pydantic_ai import Agent from pydantic_ai.models.openai import OpenAICompaction
agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction()], )
agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction(token_threshold=100_000)], )
agent = Agent( ‘openai-responses:gpt-5.2’, capabilities=[OpenAICompaction(message_count_threshold=20)], )
def __init__(
stateless: bool | None = None,
token_threshold: int | None = None,
message_count_threshold: int | None = None,
trigger: Callable[[list[ModelMessage]], bool] | None = None,
instructions: str | None = None,
) -> None
Initialize the OpenAI compaction capability.
Select the compaction mode explicitly. If None (the
default), the mode is inferred from the other parameters:
passing any stateless-only parameter (message_count_threshold
or trigger) implies stateless=True; otherwise stateful
mode is used.
Stateful-mode only. Input token threshold at which
OpenAI’s server-side compaction is triggered. Corresponds to
compact_threshold in the context_management API field. If
None, OpenAI picks a server-side default.
Stateless-mode only. Compact when the message count exceeds this threshold.
trigger : Callable[[list[ModelMessage]], bool] | None Default: None
Stateless-mode only. Custom callable that decides whether
to compact based on the current messages. Takes precedence
over message_count_threshold.
Deprecated. OpenAI’s /compact endpoint treats
instructions as a system/developer message inserted into
the compaction model’s context, not as a directive for how
to summarize the conversation. This does not match
AnthropicCompaction.instructions
semantics, so the field is deprecated and will be removed
in a future version.
Models that are deprecated or don’t exist but are still present in the OpenAI SDK’s type definitions.
Type: frozenset[str] Default: frozenset({'chatgpt-4o-latest', 'codex-mini-latest', 'gpt-4-0125-preview', 'gpt-4-1106-preview', 'gpt-4-turbo-preview', 'gpt-4-32k', 'gpt-4-32k-0314', 'gpt-4-32k-0613', 'gpt-4-vision-preview', 'gpt-4o-audio-preview-2024-10-01', 'gpt-5.1-mini', 'o1-mini', 'o1-mini-2024-09-12', 'o1-preview', 'o1-preview-2024-09-12'})
Possible OpenAI model names.
Since OpenAI supports a variety of date-stamped models, we explicitly list the latest models but allow any name in the type hints. See the OpenAI docs for a full list.
Using this more broad type for the model name instead of the ChatModel definition allows this model to be used more easily with other model types (ie, Ollama, Deepseek).
Default: str | AllModels
Prefix for OpenAI connector IDs. OpenAI supports either a URL or a connector ID when passing MCP configuration to a model,
by using that prefix like x-openai-connector:<connector-id> in a URL, you can pass a connector ID to a model.
Type: Literal[‘x-openai-connector’] Default: 'x-openai-connector'