Pydantic AI v2: capabilities, a leaner core, and the Harness

Pydantic AI v2 is here, and your agents have never been more capable. We shipped Pydantic AI v1 last September and have put out more than a hundred releases since, without once breaking your code. The inner loop of an agent is settled by now: call the model, run a tool, feed the result back. The real leverage is in the layer around it: not just the instructions and tools you give an agent, but the hooks that rewrite what the model sees mid-run, context management, steering, and loading the right tools just in time. v2 turns that whole layer into one thing you compose: the capability.

One primitive: the capability

A capability bundles an agent's instructions, tools, lifecycle hooks, and model settings into a single, composable unit, so a whole extension (a memory system, a guardrail, a coding toolkit) can reach every layer of the agent through one concept. It is the unit of agent behavior that lives in the loop, and you attach one the same way you attach any other:

from pydantic_ai import Agent
from pydantic_ai.capabilities import Capability, Thinking, ToolSearch, WebSearch
from pydantic_ai.mcp import MCPToolset
from pydantic_ai_harness import CodeMode

agent = Agent(
    'anthropic:claude-opus-4-7',
    instructions='Research thoroughly and cite your sources.',
    capabilities=[
        Thinking(effort='high'),  # extended thinking, unified across providers
        CodeMode(),               # one run_code call replaces N tool calls, sandboxed by Monty
        WebSearch(),              # native where the provider supports it, local fallback otherwise
        ToolSearch(),             # discover tools on demand instead of listing hundreds upfront
        Capability(
            id='github',
            description='Look up GitHub issues, pull requests, and code.',
            instructions='Use the GitHub tools when a question is about a repository.',
            toolset=MCPToolset('https://mcp.example.com/github'),
            defer_loading=True,  # stays out of the prompt until the model loads it on demand
        ),
    ],
)

Some of these are just model settings, like Thinking. Some wrap a native tool, like WebSearch, which runs natively where the provider supports it and falls back to a local implementation otherwise. The powerful ones use hooks to read and rewrite what the model sees on every step, including its tools, its instructions, and its message history. Code mode and tool search are built on exactly the same public hooks your own capabilities would use, so the batteries we ship double as worked examples.

The GitHub entry shows a richer shape: a Capability you build inline from an id, a description, some instructions, and a toolset (here an MCP server). Marked defer_loading=True, it stays out of the prompt until the model needs it: the model sees only the one-line description in a compact catalog, then loads the whole bundle, instructions and tools together, in a single step when it decides to.

The capability is also why so much has landed lately. In recent releases we have turned more and more of the framework into capabilities: instrumentation, deferred tool calls resolved in the loop, server-side compaction for OpenAI and Anthropic, capabilities built dynamically per run, on-demand loading so a deferred capability stays out of the prompt until the model needs it, a pending message queue for steering a run mid-flight, and even durable execution, which is moving onto the same capability layer (in progress, with a runtime extension point tracked for after v2).

Because capabilities are serializable, an agent can be loaded from a spec file, and the surface is small enough that an LLM can write one: point a coding agent at the capabilities docs and it builds most of what you need. It points at something we are excited about, though not a promise yet: with Monty, our safe Python subset, an agent could propose its own declarative tweaks, like adding a hook that trims an oversized tool result before it fills the context window. And because instrumentation is now a capability too, the traces you already send to Logfire close the loop: an agent that reads its own runs could spot the clearly-wrong things, a pair of contradictory instructions or a tool whose description doesn't match what it does, and suggest the fix. We have already started turning that loop into something real in Logfire.

The Harness and a leaner core

Some capabilities ship with Pydantic AI itself; more come from the first-party Pydantic AI Harness, the batteries for your agent (memory, guardrails, context management, file system access, code mode, and more); and others are third-party or your own. Plenty already come from the community: VStorm and others ship capabilities that we endorse and link to from the Harness, and are working to upstream. The Harness is where we are spending June: a wave of new capabilities, plus a headless coding agent built on Pydantic AI that we are dogfooding across Pydantic's own repositories.

The split is deliberate. Core stays small and stable, shipping the loop, the providers, the capability and hooks API, and only the capabilities that need deep provider support or are fundamental to every agent. Everything else lives in the Harness, where it can move fast, and a capability can graduate into core once it proves broadly essential. v2 leans into that: uv add pydantic-ai still includes OpenAI, Anthropic, and Google by default, but the long tail of providers (bedrock, groq, mistral, and friends) is now opt-in, so you install only what you use. The full Upgrade Guide covers every behavior change, split into what a deprecation warning already caught and what to check by hand.

A word on the version policy

One deliberate change comes with v2: the no-breaking-changes window between major versions moves from six months to three. This is not us caring less about stability. The field moves fast enough that committing further out means committing to decisions that fit today and not the world three months from now. Everything else stands. No breaking changes within a major version, and deprecations always land before removals, exactly as you saw in the run-up to this release: the latest v1 already warns about most of what v2 changes.

Try it

uv add pydantic-ai

Try it on something real, keep the Upgrade Guide handy if you are coming from v1, and tell us what you build (or what breaks) on GitHub or in Slack. We can't wait to see it.

Pydantic AI v2: capable agentic loops

One primitive: the capability

The Harness and a leaner core

A word on the version policy

Try it

Ready to see what your agents are actually doing?

Related content

When agents improve agents

The best AI agent optimization platforms in 2026

Pydantic AI v2: capable agentic loops

#One primitive: the capability

#The Harness and a leaner core

#A word on the version policy

#Try it

Ready to see what your agents are actually doing?

Related content

When agents improve agents

The best AI agent optimization platforms in 2026

One primitive: the capability

The Harness and a leaner core

A word on the version policy

Try it