Messages and chat history

Pydantic AI provides access to messages exchanged during an agent run. These messages can be used both to continue a coherent conversation, and to understand how an agent performed.

Accessing Messages from Results

After running an agent, you can access the messages exchanged during that run from the result object.

Both RunResult (returned by Agent.run, Agent.run_sync) and StreamedRunResult (returned by Agent.run_stream) have the following methods:

all_messages(): returns all messages, including messages from prior runs. There’s also a variant that returns JSON bytes, all_messages_json().
new_messages(): returns only the messages from the current run. There’s also a variant that returns JSON bytes, new_messages_json().

Example of accessing methods on a RunResult :

run_result_messages.py

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2', instructions='Be a helpful assistant.')

result = agent.run_sync('Tell me a joke.')
print(result.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.

# all messages from the run
print(result.all_messages())
"""
[
    ModelRequest(
        parts=[
            UserPromptPart(
                content='Tell me a joke.',
                timestamp=datetime.datetime(...),
            )
        ],
        timestamp=datetime.datetime(...),
        instructions='Be a helpful assistant.',
        run_id='...',
        conversation_id='...',
    ),
    ModelResponse(
        parts=[
            TextPart(
                content='Did you hear about the toothpaste scandal? They called it Colgate.'
            )
        ],
        usage=RequestUsage(input_tokens=55, output_tokens=12),
        model_name='gpt-5.2',
        timestamp=datetime.datetime(...),
        run_id='...',
        conversation_id='...',
    ),
]
"""

(This example is complete, it can be run “as is”)

Example of accessing methods on a StreamedRunResult :

streamed_run_result_messages.py

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2', instructions='Be a helpful assistant.')


async def main():
    async with agent.run_stream('Tell me a joke.') as result:
        # incomplete messages before the stream finishes
        print(result.all_messages())
        """
        [
            ModelRequest(
                parts=[
                    UserPromptPart(
                        content='Tell me a joke.',
                        timestamp=datetime.datetime(...),
                    )
                ],
                timestamp=datetime.datetime(...),
                instructions='Be a helpful assistant.',
                run_id='...',
                conversation_id='...',
            )
        ]
        """

        async for text in result.stream_text():
            print(text)
            #> Did you hear
            #> Did you hear about the toothpaste
            #> Did you hear about the toothpaste scandal? They called
            #> Did you hear about the toothpaste scandal? They called it Colgate.

        # complete messages once the stream finishes
        print(result.all_messages())
        """
        [
            ModelRequest(
                parts=[
                    UserPromptPart(
                        content='Tell me a joke.',
                        timestamp=datetime.datetime(...),
                    )
                ],
                timestamp=datetime.datetime(...),
                instructions='Be a helpful assistant.',
                run_id='...',
                conversation_id='...',
            ),
            ModelResponse(
                parts=[
                    TextPart(
                        content='Did you hear about the toothpaste scandal? They called it Colgate.'
                    )
                ],
                usage=RequestUsage(input_tokens=50, output_tokens=12),
                model_name='gpt-5.2',
                timestamp=datetime.datetime(...),
                run_id='...',
                conversation_id='...',
            ),
        ]
        """

(This example is complete, it can be run “as is” — you’ll need to add asyncio.run(main()) to run main)

Using Messages as Input for Further Agent Runs

The primary use of message histories in Pydantic AI is to maintain context across multiple agent runs.

To use existing messages in a run, pass them to the message_history parameter of Agent.run, Agent.run_sync or Agent.run_stream.

If message_history is set and not empty, a new system prompt is not generated — we assume the existing message history includes a system prompt. If your history comes from a source that doesn’t round-trip system prompts (a UI frontend, a database that didn’t persist them, a compaction pipeline), add the ReinjectSystemPrompt capability so the agent’s configured system_prompt is reinjected at the head of the first request when it’s missing.

Mid-conversation SystemPromptParts (those in any ModelRequest after the first) are sent inline at their original position by providers whose API accepts system messages at arbitrary positions. For providers whose API doesn’t, they’re instead rendered as <system>-tagged UserPromptParts at the same position, preserving the prefix cache and positional intent. Leading SystemPromptParts always hoist to the provider’s top-level system parameter.

Reusing messages in a conversation

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2', instructions='Be a helpful assistant.')

result1 = agent.run_sync('Tell me a joke.')
print(result1.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.

result2 = agent.run_sync('Explain?', message_history=result1.new_messages())
print(result2.output)
#> This is an excellent joke invented by Samuel Colvin, it needs no explanation.

print(result2.all_messages())
"""
[
    ModelRequest(
        parts=[
            UserPromptPart(
                content='Tell me a joke.',
                timestamp=datetime.datetime(...),
            )
        ],
        timestamp=datetime.datetime(...),
        instructions='Be a helpful assistant.',
        run_id='...',
        conversation_id='...',
    ),
    ModelResponse(
        parts=[
            TextPart(
                content='Did you hear about the toothpaste scandal? They called it Colgate.'
            )
        ],
        usage=RequestUsage(input_tokens=55, output_tokens=12),
        model_name='gpt-5.2',
        timestamp=datetime.datetime(...),
        run_id='...',
        conversation_id='...',
    ),
    ModelRequest(
        parts=[
            UserPromptPart(
                content='Explain?',
                timestamp=datetime.datetime(...),
            )
        ],
        timestamp=datetime.datetime(...),
        instructions='Be a helpful assistant.',
        run_id='...',
        conversation_id='...',
    ),
    ModelResponse(
        parts=[
            TextPart(
                content='This is an excellent joke invented by Samuel Colvin, it needs no explanation.'
            )
        ],
        usage=RequestUsage(input_tokens=56, output_tokens=26),
        model_name='gpt-5.2',
        timestamp=datetime.datetime(...),
        run_id='...',
        conversation_id='...',
    ),
]
"""

(This example is complete, it can be run “as is”)

Correlating runs with `conversation_id`

Each ModelRequest and ModelResponse carries two identifiers:

run_id — unique per agent run; emitted on the OpenTelemetry agent run span as gen_ai.agent.call.id.
conversation_id — shared across all runs that build on the same message_history; emitted as gen_ai.conversation.id.

A fresh conversation_id is generated on the first run, stamped onto every message produced by that run, and inherited by subsequent runs that pass the messages back via message_history. This means you can correlate traces from a multi-turn conversation in Logfire (or any OpenTelemetry backend) without tracking anything yourself — as long as the message history round-trips, the conversation ID does too.

conversation_id is shared across runs in the same conversation

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2')

result1 = agent.run_sync('Tell me a joke.')
result2 = agent.run_sync('Explain?', message_history=result1.all_messages())

assert result1.conversation_id == result2.conversation_id

To override or fork:

Pass conversation_id='<your-id>' to use an ID from your own application (e.g. a chat thread ID stored in your database).
Pass conversation_id='new' to start a fresh conversation that ignores any conversation_id already on message_history — useful for branching off an existing thread without making the caller generate an ID.

forking a conversation

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2')

result1 = agent.run_sync('Tell me a joke.')
forked = agent.run_sync(
    'Tell me a different joke.',
    message_history=result1.all_messages(),
    conversation_id='new',
)

assert forked.conversation_id != result1.conversation_id

The UI adapters auto-populate conversation_id from the protocol’s own thread/chat ID, so frontends using these protocols get correlation for free.

Storing and loading messages (to JSON)

While maintaining conversation state in memory is enough for many applications, often times you may want to store the messages history of an agent run on disk or in a database. This might be for evals, for sharing data between Python and JavaScript/TypeScript, or any number of other use cases.

The intended way to do this is using a TypeAdapter.

We export ModelMessagesTypeAdapter that can be used for this, or you can create your own.

Here’s an example showing how:

serialize messages to json

from pydantic_core import to_jsonable_python

from pydantic_ai import (
  Agent,
  ModelMessagesTypeAdapter,  # (1)
)

agent = Agent('openai:gpt-5.2', instructions='Be a helpful assistant.')

result1 = agent.run_sync('Tell me a joke.')
history_step_1 = result1.all_messages()
as_python_objects = to_jsonable_python(history_step_1)  # (2)
same_history_as_step_1 = ModelMessagesTypeAdapter.validate_python(as_python_objects)

result2 = agent.run_sync(  # (3)
  'Tell me a different joke.', message_history=same_history_as_step_1
)

Alternatively you can serialize to/from JSON directly:

from pydantic_core import to_json
...
as_json_objects = to_json(history_step_1)
same_history_as_step_1 = ModelMessagesTypeAdapter.validate_json(as_json_objects)

(This example is complete, it can be run “as is”)

Other ways of using messages

Since messages are defined by simple dataclasses, you can manually create and manipulate, e.g. for testing.

The message format is independent of the model used, so you can use messages in different agents, or the same agent with different models.

In the example below, we reuse the message from the first agent run, which uses the openai:gpt-5.2 model, in a second agent run using the google:gemini-3-pro-preview model.

Reusing messages with a different model

from pydantic_ai import Agent

agent = Agent('openai:gpt-5.2', instructions='Be a helpful assistant.')

result1 = agent.run_sync('Tell me a joke.')
print(result1.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.

result2 = agent.run_sync(
    'Explain?',
    model='google:gemini-3-pro-preview',
    message_history=result1.new_messages(),
)
print(result2.output)
#> This is an excellent joke invented by Samuel Colvin, it needs no explanation.

print(result2.all_messages())
"""
[
    ModelRequest(
        parts=[
            UserPromptPart(
                content='Tell me a joke.',
                timestamp=datetime.datetime(...),
            )
        ],
        timestamp=datetime.datetime(...),
        instructions='Be a helpful assistant.',
        run_id='...',
        conversation_id='...',
    ),
    ModelResponse(
        parts=[
            TextPart(
                content='Did you hear about the toothpaste scandal? They called it Colgate.'
            )
        ],
        usage=RequestUsage(input_tokens=55, output_tokens=12),
        model_name='gpt-5.2',
        timestamp=datetime.datetime(...),
        run_id='...',
        conversation_id='...',
    ),
    ModelRequest(
        parts=[
            UserPromptPart(
                content='Explain?',
                timestamp=datetime.datetime(...),
            )
        ],
        timestamp=datetime.datetime(...),
        instructions='Be a helpful assistant.',
        run_id='...',
        conversation_id='...',
    ),
    ModelResponse(
        parts=[
            TextPart(
                content='This is an excellent joke invented by Samuel Colvin, it needs no explanation.'
            )
        ],
        usage=RequestUsage(input_tokens=56, output_tokens=26),
        model_name='gemini-3-pro-preview',
        timestamp=datetime.datetime(...),
        run_id='...',
        conversation_id='...',
    ),
]
"""

Injecting messages mid-run

Tools, capability hooks, and external code driving an agent run can inject extra content into the conversation mid-run with RunContext.enqueue (when a RunContext is in scope, e.g. inside a tool or capability hook) or AgentRun.enqueue (from external code driving agent.iter()). Use this when something happens during a run that the agent should know about — a tool wants to add follow-up context, an external event needs to steer the agent’s plan, or background work needs to reach the agent when it completes.

A priority controls when the enqueued content is delivered:

'asap' (default): delivered at the earliest opportunity — added to the next ModelRequest, or, if the agent would otherwise terminate before another request, used to redirect the run into one more request. Use when the new context should reach the model as soon as possible; this is what other frameworks often call steering an in-flight agent.
'when_idle': delivered only when the agent would otherwise terminate, after any 'asap' messages. Use when the agent shouldn’t be interrupted but should pick up the new work — a follow-up task — once it’s done with what it’s doing.

enqueue is variadic — each positional argument is one item, and can be:

a piece of UserContent — a str or multi-modal content like an ImageUrl. Adjacent user content is gathered into a single UserPromptPart, so enqueue('caption', image) forms one user turn. To pass an existing list, spread it: enqueue(*items);
a ModelRequestPart, such as a SystemPromptPart;
a complete ModelRequest or ModelResponse, to control request-level fields like instructions/metadata or to inject a synthetic prior turn.

Adjacent part-style items (user content and ModelRequestParts) are coalesced into one ModelRequest; complete messages stay separate. This lets a single call inject an interleaved exchange — for example a synthetic tool call (a ModelResponse) followed by its result (a ModelRequest). The content must end in a request, so the agent has something to respond to.

From inside a tool or hook

Use RunContext.enqueue when you have a RunContext in scope:

enqueue_from_tool.py

from pydantic_ai import Agent, RunContext
from pydantic_ai.messages import SystemPromptPart

agent = Agent('anthropic:claude-opus-4-7')


@agent.tool
def trigger_alert(ctx: RunContext[None]) -> str:
    ctx.enqueue('Alert: production is degraded, prioritize triage.')
    return 'alert raised'


@agent.tool
def enter_incident_mode(ctx: RunContext[None]) -> str:
    # Enqueue a `SystemPromptPart` to adjust the agent's standing instructions mid-run.
    ctx.enqueue(SystemPromptPart(content='You are now in incident mode: be terse and action-oriented.'))
    return 'incident mode enabled'

The 'asap' message is appended to the agent’s message history and is visible to the model on the next request, alongside any tool returns from the same step. A SystemPromptPart is delivered the same way; on providers that hoist system prompts (e.g. Anthropic, Google) a non-leading one is sent as a <system>-tagged user-role message, so it keeps its mid-conversation position rather than being lifted to the top.

From external code driving `agent.iter()`

Use AgentRun.enqueue when you’re driving a run from outside (e.g. forwarding events from a webhook, chat platform, or job queue):

enqueue_from_agent_run.py

from pydantic_ai import Agent
from pydantic_graph import End

agent = Agent('anthropic:claude-opus-4-7')


async def main():
    async with agent.iter('Summarize the latest deploy report') as agent_run:
        # An external system pushes a follow-up while the agent is working.
        # When the agent would otherwise finish, the message redirects it
        # into a fresh model request so it can incorporate the new context.
        agent_run.enqueue(
            'A new error was just reported — include it in the summary.',
            priority='when_idle',
        )
        node = agent_run.next_node
        while not isinstance(node, End):
            node = await agent_run.next(node)

The example drives the run with agent.iter() + AgentRun.next() because 'when_idle' messages are only drained when the agent would otherwise reach an End — that drain happens in after_node_run, which doesn’t fire inside a bare async for node in agent_run: loop. 'asap' messages are drained in before_model_request (which fires either way) and also at the same end-of-run point if anything arrived during the final step. Reaching the end of a bare async for loop with undrained pending messages raises UndrainedPendingMessagesError, since those messages would otherwise be silently lost.

End-of-run redirects need Agent.run or explicit AgentRun.next() driving — they aren’t drained inside a bare async for node in agent_run: loop (which raises UndrainedPendingMessagesError if it ends with undrained messages). Messages delivered into a before_model_request work in either case.
Inside a Temporal workflow, tools run in activities and don’t share state with the workflow, so ctx.enqueue from a tool doesn’t currently propagate back to the run. Enqueue from the workflow context (e.g. via AgentRun.enqueue) instead.
Each end-of-run redirect opens a new model request. If something keeps enqueueing on every step (e.g. a tool that always enqueues, or a system-prompt callback that re-enqueues on each reinjection), the run will loop indefinitely. Set UsageLimits on the run as a safety net.
enqueue is designed to be called from the same event loop that drives the agent run. Inside the run that’s automatic: async tools, sync tools (which Pydantic AI auto-wraps in a thread executor), and capability hooks all enqueue safely because the drain only iterates between graph nodes, never concurrently with a tool body. If you’re forwarding events from a different thread or loop (e.g. a webhook handler), marshal the call onto the agent’s loop first — e.g. loop.call_soon_threadsafe(agent_run.enqueue, msg). The drain isn’t atomic against concurrent cross-thread appends.

Processing Message History

Sometimes you may want to modify the message history before it’s sent to the model. This could be for privacy reasons (filtering out sensitive information), to save costs on tokens, to give less context to the LLM, or custom processing logic.

Pydantic AI provides the ProcessHistory capability that allows you to intercept and modify the message history before each model request.

Usage

Each ProcessHistory wraps a callable that takes a list of ModelMessage and returns a modified list of the same type.

Each processor is applied in sequence, and processors can be either synchronous or asynchronous.

simple_history_processor.py

from pydantic_ai import (
    Agent,
    ModelMessage,
    ModelRequest,
    ModelResponse,
    TextPart,
    UserPromptPart,
)
from pydantic_ai.capabilities import ProcessHistory


def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
    """Remove all ModelResponse messages, keeping only ModelRequest messages."""
    return [msg for msg in messages if isinstance(msg, ModelRequest)]

# Create agent with history processor
agent = Agent('openai:gpt-5.2', capabilities=[ProcessHistory(filter_responses)])

# Example: Create some conversation history
message_history = [
    ModelRequest(parts=[UserPromptPart(content='What is 2+2?')]),
    ModelResponse(parts=[TextPart(content='2+2 equals 4')]),  # This will be filtered out
]

# When you run the agent, the history processor will filter out ModelResponse messages
# result = agent.run_sync('What about 3+3?', message_history=message_history)

Keep Only Recent Messages

You can use the history_processor to only keep the recent messages:

keep_recent_messages.py

from pydantic_ai import Agent, ModelMessage
from pydantic_ai.capabilities import ProcessHistory


async def keep_recent_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
    """Keep only the last 5 messages to manage token usage."""
    return messages[-5:] if len(messages) > 5 else messages

agent = Agent('openai:gpt-5.2', capabilities=[ProcessHistory(keep_recent_messages)])

# Example: Even with a long conversation history, only the last 5 messages are sent to the model
long_conversation_history: list[ModelMessage] = []  # Your long conversation history here
# result = agent.run_sync('What did we discuss?', message_history=long_conversation_history)

`RunContext` parameter

History processors can optionally accept a RunContext parameter to access additional information about the current run, such as dependencies, model information, and usage statistics:

context_aware_processor.py

from pydantic_ai import Agent, ModelMessage, RunContext
from pydantic_ai.capabilities import ProcessHistory


def context_aware_processor(
    ctx: RunContext[None],
    messages: list[ModelMessage],
) -> list[ModelMessage]:
    # Access current usage
    current_tokens = ctx.usage.total_tokens

    # Filter messages based on context
    if current_tokens > 1000:
        return messages[-3:]  # Keep only recent messages when token usage is high
    return messages

agent = Agent('openai:gpt-5.2', capabilities=[ProcessHistory(context_aware_processor)])

This allows for more sophisticated message processing based on the current state of the agent run.

Summarize Old Messages

Use an LLM to summarize older messages to preserve context while reducing tokens.

summarize_old_messages.py

from pydantic_ai import Agent, ModelMessage
from pydantic_ai.capabilities import ProcessHistory

# Use a cheaper model to summarize old messages.
summarize_agent = Agent(
    'openai:gpt-5-mini',
    instructions="""
Summarize this conversation, omitting small talk and unrelated topics.
Focus on the technical discussion and next steps.
""",
)


async def summarize_old_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
    # Summarize the oldest 10 messages
    if len(messages) > 10:
        oldest_messages = messages[:10]
        summary = await summarize_agent.run(message_history=oldest_messages)
        # Return the last message and the summary
        return summary.new_messages() + messages[-1:]

    return messages


agent = Agent('openai:gpt-5.2', capabilities=[ProcessHistory(summarize_old_messages)])

Testing History Processors

You can test what messages are actually sent to the model provider using FunctionModel:

test_history_processor.py

import pytest

from pydantic_ai import (
    Agent,
    ModelMessage,
    ModelRequest,
    ModelResponse,
    TextPart,
    UserPromptPart,
)
from pydantic_ai.capabilities import ProcessHistory
from pydantic_ai.models.function import AgentInfo, FunctionModel


@pytest.fixture
def received_messages() -> list[ModelMessage]:
    return []


@pytest.fixture
def function_model(received_messages: list[ModelMessage]) -> FunctionModel:
    def capture_model_function(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse:
        # Capture the messages that the provider actually receives
        received_messages.clear()
        received_messages.extend(messages)
        return ModelResponse(parts=[TextPart(content='Provider response')])

    return FunctionModel(capture_model_function)


def test_history_processor(function_model: FunctionModel, received_messages: list[ModelMessage]):
    def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
        return [msg for msg in messages if isinstance(msg, ModelRequest)]

    agent = Agent(function_model, capabilities=[ProcessHistory(filter_responses)])

    message_history = [
        ModelRequest(parts=[UserPromptPart(content='Question 1')]),
        ModelResponse(parts=[TextPart(content='Answer 1')]),
    ]

    agent.run_sync('Question 2', message_history=message_history)
    assert received_messages == [
        ModelRequest(parts=[UserPromptPart(content='Question 1')]),
        ModelRequest(parts=[UserPromptPart(content='Question 2')]),
    ]

Multiple Processors

You can also use multiple processors:

multiple_history_processors.py

from pydantic_ai import Agent, ModelMessage, ModelRequest
from pydantic_ai.capabilities import ProcessHistory


def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
    return [msg for msg in messages if isinstance(msg, ModelRequest)]


def summarize_old_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
    return messages[-5:]


agent = Agent(
    'openai:gpt-5.2',
    capabilities=[ProcessHistory(filter_responses), ProcessHistory(summarize_old_messages)],
)

In this case, the filter_responses processor will be applied first, and the summarize_old_messages processor will be applied second.

Examples

For a more complete example of using messages in conversations, see the chat app example.

Messages and chat history

Accessing Messages from Results

Using Messages as Input for Further Agent Runs

Correlating runs with conversation_id

Storing and loading messages (to JSON)

Other ways of using messages

Injecting messages mid-run

From inside a tool or hook

From external code driving agent.iter()

Processing Message History

Usage

Keep Only Recent Messages

RunContext parameter

Summarize Old Messages

Testing History Processors

Multiple Processors

Examples

Correlating runs with `conversation_id`

From external code driving `agent.iter()`

`RunContext` parameter