Image, Audio, Video & Document Input

Image Input

If you have a direct URL for the image, you can use ImageUrl:

image_input.py

from pydantic_ai import Agent, ImageUrl

agent = Agent(model='openai:gpt-5.2')
result = agent.run_sync(
    [
        'What company is this logo from?',
        ImageUrl(url='https://iili.io/3Hs4FMg.png'),
    ]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.

If you have the image locally, you can also use BinaryContent:

local_image_input.py

import httpx

from pydantic_ai import Agent, BinaryContent

image_response = httpx.get('https://iili.io/3Hs4FMg.png')  # Pydantic logo

agent = Agent(model='openai:gpt-5.2')
result = agent.run_sync(
  [
      'What company is this logo from?',
      BinaryContent(data=image_response.content, media_type='image/png'),  # (1)
  ]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.

Audio Input

You can provide audio input using either AudioUrl or BinaryContent. The process is analogous to the examples above.

Video Input

You can provide video input using either VideoUrl or BinaryContent. The process is analogous to the examples above.

Document Input

You can provide document input using either DocumentUrl or BinaryContent. The process is similar to the examples above.

If you have a direct URL for the document, you can use DocumentUrl:

document_input.py

from pydantic_ai import Agent, DocumentUrl

agent = Agent(model='anthropic:claude-sonnet-4-6')
result = agent.run_sync(
    [
        'What is the main content of this document?',
        DocumentUrl(url='https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf'),
    ]
)
print(result.output)
#> This document is the technical report introducing Gemini 1.5, Google's latest large language model...

The supported document formats vary by model.

You can also use BinaryContent to pass document data directly:

binary_content_input.py

from pathlib import Path
from pydantic_ai import Agent, BinaryContent

pdf_path = Path('document.pdf')
agent = Agent(model='anthropic:claude-sonnet-4-6')
result = agent.run_sync(
    [
        'What is the main content of this document?',
        BinaryContent(data=pdf_path.read_bytes(), media_type='application/pdf'),
    ]
)
print(result.output)
#> The document discusses...

Text Input

You can use TextContent to provide text input with additional metadata:

text_content_input.py

from pydantic_ai import Agent, TextContent

agent = Agent(model='openai:gpt-5.2')
result = agent.run_sync([
    'Summarize the key points from this text.',
    TextContent(
        content=(
            'Pydantic AI is a Python agent framework. '
            'It supports text, image, audio, video, and document input.'
        ),
        metadata={'source': 'pydantic_ai_inputs.txt'},
    ),
])

This is equivalent to passing the text as a str, but allows you to include additional metadata that can be accessed programmatically in your agent logic.

User-side download vs. direct file URL

When using one of ImageUrl, AudioUrl, VideoUrl or DocumentUrl, Pydantic AI will default to sending the URL to the model provider, so the file is downloaded on their side.

Support for file URLs varies depending on type and provider:

Model	Send URL directly	Download and send bytes	Unsupported
`OpenAIChatModel`	`ImageUrl`	`AudioUrl`, `DocumentUrl`	`VideoUrl`
`OpenAIResponsesModel`	`ImageUrl`, `AudioUrl`, `DocumentUrl`	—	`VideoUrl`
`AnthropicModel`	`ImageUrl`, `DocumentUrl` (PDF)	`DocumentUrl` (`text/plain`)	`AudioUrl`, `VideoUrl`
`GoogleModel` (Vertex)	All URL types	—	—
`GoogleModel` (GLA)	YouTube, Files API	All other URLs	—
`XaiModel`	`ImageUrl`	`DocumentUrl`	`AudioUrl`, `VideoUrl`
`MistralModel`	`ImageUrl`, `DocumentUrl` (PDF)	—	`AudioUrl`, `VideoUrl`, `DocumentUrl` (non-PDF)
`BedrockConverseModel`	S3 URLs (`s3://`)	`ImageUrl`, `DocumentUrl`, `VideoUrl`	`AudioUrl`
`OpenRouterModel`	`ImageUrl`, `DocumentUrl`, `VideoUrl`	`AudioUrl`	—

A model API may be unable to download a file (e.g., because of crawling or access restrictions) even if it supports file URLs. For example, GoogleModel on Vertex AI limits YouTube video URLs to one URL per request. In such cases, you can instruct Pydantic AI to download the file content locally and send that instead of the URL by setting force_download on the URL object:

force_download.py

from pydantic_ai import ImageUrl, AudioUrl, VideoUrl, DocumentUrl

ImageUrl(url='https://example.com/image.png', force_download=True)
AudioUrl(url='https://example.com/audio.mp3', force_download=True)
VideoUrl(url='https://example.com/video.mp4', force_download=True)
DocumentUrl(url='https://example.com/doc.pdf', force_download=True)

Uploaded Files

Some model providers have their own file storage APIs where you can upload files and reference them by ID or URL.

Use UploadedFile to reference files that have been uploaded to a provider’s file storage API.

Supported Models

Model	Support
`AnthropicModel`	✅ via Anthropic Files API
`OpenAIChatModel`	✅ via OpenAI Files API
`OpenAIResponsesModel`	✅ via OpenAI Files API
`GoogleModel`	✅ via Google Files API
`BedrockConverseModel`	✅ via S3 URLs (`s3://bucket/key`)
`XaiModel`	✅ via xAI Files API
Other models	❌ Not supported

Provider Name Requirement

When using UploadedFile you must set the provider_name. Uploaded files are specific to the system they are uploaded to and are not transferable across providers. Trying to use a message that contains an UploadedFile with a different provider will result in an error.

If you want to introduce portability into your agent logic to allow the same prompt history to work with different provider backends, you can use a history processor to remove or rewrite UploadedFile parts from messages before sending them to a provider that does not support them. Be aware that stripping out UploadedFile instances might confuse the model, especially if references to those files remain in the text.

Media Type Inference

The media_type parameter is optional for UploadedFile. If not specified, Pydantic AI will attempt to infer it from the file_id:

If file_id is a URL or path with a recognizable file extension (e.g., .pdf, .png), the media type is inferred automatically
For opaque file IDs (e.g., 'file-abc123'), the media type defaults to 'application/octet-stream'

Anthropic

Follow the Anthropic Files API docs to upload files. You can access the underlying Anthropic client via provider.client.

uploaded_file_anthropic.py

import asyncio

from pydantic_ai import Agent, ModelSettings, UploadedFile
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider


async def main():
    provider = AnthropicProvider()
    model = AnthropicModel('claude-sonnet-4-5', provider=provider)

    # Upload a file using the provider's client (Anthropic client)
    with open('document.pdf', 'rb') as f:
        uploaded_file = await provider.client.beta.files.upload(file=f)

    # Reference the uploaded file, including the required beta header
    agent = Agent(model)
    result = await agent.run(
        [
            'Summarize this document',
            UploadedFile(file_id=uploaded_file.id, provider_name=model.system),
        ],
        model_settings=ModelSettings(extra_headers={'anthropic-beta': 'files-api-2025-04-14'}),
    )
    print(result.output)
    #> The document discusses the main topics and key findings...


asyncio.run(main())

OpenAI

Follow the OpenAI Files API docs to upload files. You can access the underlying OpenAI client via provider.client.

uploaded_file_openai.py

import asyncio

from pydantic_ai import Agent, UploadedFile
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider


async def main():
    provider = OpenAIProvider()
    model = OpenAIChatModel('gpt-5', provider=provider)

    # Upload a file using the provider's client (OpenAI client)
    with open('document.pdf', 'rb') as f:
        uploaded_file = await provider.client.files.create(file=f, purpose='user_data')

    # Reference the uploaded file
    agent = Agent(model)
    result = await agent.run(
        [
            'Summarize this document',
            UploadedFile(file_id=uploaded_file.id, provider_name=model.system),
        ]
    )
    print(result.output)
    #> The document discusses the main topics and key findings...


asyncio.run(main())

Google

Follow the Google Files API docs to upload files. You can access the underlying Google GenAI client via provider.client.

uploaded_file_google.py

import asyncio

from pydantic_ai import Agent, UploadedFile
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider


async def main():
    provider = GoogleProvider()
    model = GoogleModel('gemini-2.5-flash', provider=provider)

    # Upload a file using the provider's client (Google GenAI client)
    with open('document.pdf', 'rb') as f:
        file = await provider.client.aio.files.upload(file=f)
        assert file.uri is not None

    # Reference the uploaded file by URI (media_type is optional for Google)
    agent = Agent(model)
    result = await agent.run(
        [
            'Summarize this document',
            UploadedFile(file_id=file.uri, media_type=file.mime_type, provider_name=model.system),
        ]
    )
    print(result.output)
    #> The document discusses the main topics and key findings...


asyncio.run(main())

Bedrock (S3)

For Bedrock, files must be uploaded to S3 separately (e.g., using boto3). The assumed role must have s3:GetObject permission on the bucket.

uploaded_file_bedrock.py

import asyncio

from pydantic_ai import Agent, UploadedFile
from pydantic_ai.models.bedrock import BedrockConverseModel


async def main():
    model = BedrockConverseModel('us.anthropic.claude-sonnet-4-20250514-v1:0')

    agent = Agent(model)
    result = await agent.run([
        'Summarize this document',
        UploadedFile(
            file_id='s3://my-bucket/document.pdf',
            provider_name=model.system,  # 'bedrock'
            media_type='application/pdf',  # Optional for .pdf, but recommended
        ),
    ])
    print(result.output)
    #> The document discusses the main topics and key findings...


asyncio.run(main())

xAI

Follow the xAI Files API docs to upload files. You can access the underlying xAI client via provider.client.

uploaded_file_xai.py

import asyncio

from pydantic_ai import Agent, UploadedFile
from pydantic_ai.models.xai import XaiModel
from pydantic_ai.providers.xai import XaiProvider


async def main():
    provider = XaiProvider()
    model = XaiModel('grok-4-fast', provider=provider)

    # Upload a file using the provider's client (xAI client)
    with open('document.pdf', 'rb') as f:
        uploaded_file = await provider.client.files.upload(f, filename='document.pdf')

    # Reference the uploaded file
    agent = Agent(model)
    result = await agent.run(
        [
            'Summarize this document',
            UploadedFile(file_id=uploaded_file.id, provider_name=model.system),
        ]
    )
    print(result.output)
    #> The document discusses the main topics and key findings...


asyncio.run(main())