pydantic_ai.embeddings

EmbeddingSettings

Bases: TypedDict

Common settings for configuring embedding models.

These settings apply across multiple embedding model providers. Not all settings are supported by all models - check the specific model’s documentation for details.

Provider-specific settings classes (e.g., OpenAIEmbeddingSettings, CohereEmbeddingSettings) extend this with additional provider-prefixed options.

Attributes

dimensions

The number of dimensions for the output embeddings.

Supported by:

OpenAI
Cohere
Google
Sentence Transformers
Bedrock
VoyageAI

Type: int

truncate

Whether to truncate inputs that exceed the model’s context length.

Defaults to False. If True, inputs that are too long will be truncated. If False, an error will be raised for inputs that exceed the context length.

For more control over truncation, you can use max_input_tokens() and count_tokens() to implement your own truncation logic.

Provider-specific truncation settings (e.g., cohere_truncate, bedrock_cohere_truncate) take precedence if specified.

Supported by:

Cohere
Bedrock (Cohere and Nova models)
VoyageAI

Type: bool

extra_headers

Extra headers to send to the model.

Supported by:

OpenAI
Cohere

Type: dict[str, str]

extra_body

Extra body to send to the model.

Supported by:

OpenAI
Cohere

Type: object

EmbeddingModel

Bases: ABC

Abstract base class for embedding models.

Implement this class to create a custom embedding model. For most use cases, use one of the built-in implementations:

Attributes

settings

Get the default settings for this model.

Type: EmbeddingSettings | None

base_url

The base URL for the provider API, if available.

Type: str | None

model_name

The name of the embedding model.

Type: str

system

The embedding model provider/system identifier (e.g., ‘openai’, ‘cohere’).

Type: str

Methods

init

def __init__(settings: EmbeddingSettings | None = None) -> None

Initialize the model with optional settings.

Returns

None

Parameters

settings : EmbeddingSettings | None Default: None

Model-specific settings that will be used as defaults for this model.

embed

@abstractmethod

@async

def embed(
    inputs: str | Sequence[str],
    input_type: EmbedInputType,
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Generate embeddings for the given inputs.

Returns

EmbeddingResult — An EmbeddingResult containing EmbeddingResult — the embeddings and metadata.

Parameters

inputs : str | Sequence[str]

A single string or sequence of strings to embed.

input_type : EmbedInputType

Whether the inputs are queries or documents.

settings : EmbeddingSettings | None Default: None

Optional settings to override the model’s defaults.

prepare_embed

def prepare_embed(
    inputs: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> tuple[list[str], EmbeddingSettings]

Prepare the inputs and settings for embedding.

This method normalizes inputs to a list and merges settings. Subclasses should call this at the start of their embed() implementation.

Returns

tuple[list[str], EmbeddingSettings] — A tuple of (normalized inputs list, merged settings).

Parameters

inputs : str | Sequence[str]

A single string or sequence of strings.

settings : EmbeddingSettings | None Default: None

Optional settings to merge with defaults.

max_input_tokens

@async

def max_input_tokens() -> int | None

Get the maximum number of tokens that can be input to the model.

Returns

int | None — The maximum token count, or None if unknown.

count_tokens

@async

def count_tokens(text: str) -> int

Count the number of tokens in the given text.

Returns

int — The number of tokens.

Parameters

text : str

The text to tokenize and count.

Raises

NotImplementedError — If the model doesn’t support token counting.
UserError — If the model or tokenizer is not supported.

WrapperEmbeddingModel

Bases: EmbeddingModel

Base class for embedding models that wrap another model.

Use this as a base class to create custom embedding model wrappers that modify behavior (e.g., caching, logging, rate limiting) while delegating to an underlying model.

By default, all methods are passed through to the wrapped model. Override specific methods to customize behavior.

Attributes

wrapped

The underlying embedding model being wrapped.

Type: EmbeddingModel Default: infer_embedding_model(wrapped) if isinstance(wrapped, str) else wrapped

settings

Get the settings from the wrapped embedding model.

Type: EmbeddingSettings | None

Methods

init

def __init__(wrapped: EmbeddingModel | str)

Initialize the wrapper with an embedding model.

Parameters

wrapped : EmbeddingModel | str

The model to wrap. Can be an EmbeddingModel instance or a model name string (e.g., 'openai:text-embedding-3-small').

EmbeddingResult

The result of an embedding operation.

This class contains the generated embeddings along with metadata about the operation, including the original inputs, model information, usage statistics, and timing.

Example:

from pydantic_ai import Embedder

embedder = Embedder('openai:text-embedding-3-small')


async def main():
    result = await embedder.embed_query('What is AI?')

    # Access embeddings by index
    print(len(result.embeddings[0]))
    #> 1536

    # Access embeddings by original input text
    print(result['What is AI?'] == result.embeddings[0])
    #> True

    # Check usage
    print(f'Tokens used: {result.usage.input_tokens}')
    #> Tokens used: 3

Attributes

embeddings

The computed embedding vectors, one per input text.

Each embedding is a sequence of floats representing the text in vector space.

Type: Sequence[Sequence[float]]

inputs

The original input texts that were embedded.

Type: Sequence[str]

input_type

Whether the inputs were embedded as queries or documents.

Type: EmbedInputType

model_name

The name of the model that generated these embeddings.

Type: str

provider_name

The name of the provider (e.g., ‘openai’, ‘cohere’).

Type: str

timestamp

When the embedding request was made.

Type: datetime Default: field(default_factory=_now_utc)

usage

Token usage statistics for this request.

Type: RequestUsage Default: field(default_factory=RequestUsage)

provider_details

Provider-specific details from the response.

Type: dict[str, Any] | None Default: None

provider_response_id

Unique identifier for this response from the provider, if available.

Type: str | None Default: None

Methods

getitem

def __getitem__(item: int | str) -> Sequence[float]

Get the embedding for an input by index or by the original input text.

Returns

Sequence[float] — The embedding vector for the specified input.

Parameters

item : int | str

Either an integer index or the original input string.

Raises

IndexError — If the index is out of range.
ValueError — If the string is not found in the inputs.

cost

def cost() -> genai_types.PriceCalculation

Calculate the cost of the embedding request.

Uses genai-prices for pricing data.

Returns

genai_types.PriceCalculation — A price calculation object with total_price, input_price, and other cost details.

Raises

LookupError — If pricing data is not available for this model/provider.

TestEmbeddingModel

Bases: EmbeddingModel

A mock embedding model for testing.

This model returns deterministic embeddings (all 1.0 values) and tracks the settings used in the last call via the last_settings attribute.

Example:

from pydantic_ai import Embedder
from pydantic_ai.embeddings import TestEmbeddingModel

test_model = TestEmbeddingModel()
embedder = Embedder('openai:text-embedding-3-small')


async def main():
    with embedder.override(model=test_model):
        await embedder.embed_query('test')
        assert test_model.last_settings is not None

Attributes

last_settings

The settings used in the most recent embed call.

Type: EmbeddingSettings | None Default: None

model_name

The embedding model name.

Type: str

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: str = 'test',
    provider_name: str = 'test',
    dimensions: int = 8,
    settings: EmbeddingSettings | None = None,
)

Initialize the test embedding model.

Parameters

model_name : str Default: 'test'

The model name to report in results.

provider_name : str Default: 'test'

The provider name to report in results.

dimensions : int Default: 8

The number of dimensions for the generated embeddings.

settings : EmbeddingSettings | None Default: None

Optional default settings for the model.

InstrumentedEmbeddingModel

Bases: WrapperEmbeddingModel

Embedding model which wraps another model so that requests are instrumented with OpenTelemetry.

See the Debugging and Monitoring guide for more info.

Attributes

instrumentation_settings

Instrumentation settings for this model.

Type: InstrumentationSettings Default: options or InstrumentationSettings()

Embedder

High-level interface for generating text embeddings.

The Embedder class provides a convenient way to generate vector embeddings from text using various embedding model providers. It handles model inference, settings management, and optional OpenTelemetry instrumentation.

Example:

from pydantic_ai import Embedder

embedder = Embedder('openai:text-embedding-3-small')


async def main():
    result = await embedder.embed_query('What is machine learning?')
    print(result.embeddings[0][:5])  # First 5 dimensions
    #> [1.0, 1.0, 1.0, 1.0, 1.0]

Attributes

instrument

Options to automatically instrument with OpenTelemetry.

Set to True to use default instrumentation settings, which will use Logfire if it’s configured. Set to an instance of InstrumentationSettings to customize. If this isn’t set, then the last value set by Embedder.instrument_all() will be used, which defaults to False. See the Debugging and Monitoring guide for more info.

Type: InstrumentationSettings | bool | None Default: instrument

model

The embedding model used by this embedder.

Type: EmbeddingModel | KnownEmbeddingModelName | str

Methods

init

def __init__(
    model: EmbeddingModel | KnownEmbeddingModelName | str,
    settings: EmbeddingSettings | None = None,
    defer_model_check: bool = True,
    instrument: InstrumentationSettings | bool | None = None,
) -> None

Initialize an Embedder.

Returns

None

Parameters

model : EmbeddingModel | KnownEmbeddingModelName | str

The embedding model to use. Can be specified as:

A model name string in the format 'provider:model-name' (e.g., 'openai:text-embedding-3-small')
An EmbeddingModel instance

settings : EmbeddingSettings | None Default: None

Optional EmbeddingSettings to use as defaults for all embed calls.

defer_model_check : bool Default: True

Whether to defer model validation until first use. Set to False to validate the model immediately on construction.

instrument : InstrumentationSettings | bool | None Default: None

OpenTelemetry instrumentation settings. Set to True to enable with defaults, or pass an InstrumentationSettings instance to customize. If None, uses the value from Embedder.instrument_all().

instrument_all

@staticmethod

def instrument_all(instrument: InstrumentationSettings | bool = True) -> None

Set the default instrumentation options for all embedders where instrument is not explicitly set.

This is useful for enabling instrumentation globally without modifying each embedder individually.

Returns

None

Parameters

instrument : InstrumentationSettings | bool Default: True

Instrumentation settings to use as the default. Set to True for default settings, False to disable, or pass an InstrumentationSettings instance to customize.

override

def override(
    model: EmbeddingModel | KnownEmbeddingModelName | str | _utils.Unset = _utils.UNSET,
) -> Iterator[None]

Context manager to temporarily override the embedding model.

Useful for testing or dynamically switching models.

Example:

from pydantic_ai import Embedder

embedder = Embedder('openai:text-embedding-3-small')


async def main():
    # Temporarily use a different model
    with embedder.override(model='openai:text-embedding-3-large'):
        result = await embedder.embed_query('test')
        print(len(result.embeddings[0]))  # 3072 dimensions for large model
        #> 3072

Returns

Iterator[None]

Parameters

model : EmbeddingModel | KnownEmbeddingModelName | str | _utils.Unset Default: _utils.UNSET

The embedding model to use within this context.

embed_query

@async

def embed_query(
    query: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Embed one or more query texts.

Use this method when embedding search queries that will be compared against document embeddings. Some models optimize embeddings differently based on whether the input is a query or document.

Returns

EmbeddingResult — An EmbeddingResult containing the embeddings EmbeddingResult — and metadata about the operation.

Parameters

query : str | Sequence[str]

A single query string or sequence of query strings to embed.

settings : EmbeddingSettings | None Default: None

Optional settings to override the embedder’s default settings for this call.

embed_documents

@async

def embed_documents(
    documents: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Embed one or more document texts.

Use this method when embedding documents that will be stored and later searched against. Some models optimize embeddings differently based on whether the input is a query or document.

Returns

EmbeddingResult — An EmbeddingResult containing the embeddings EmbeddingResult — and metadata about the operation.

Parameters

documents : str | Sequence[str]

A single document string or sequence of document strings to embed.

settings : EmbeddingSettings | None Default: None

Optional settings to override the embedder’s default settings for this call.

embed

@async

def embed(
    inputs: str | Sequence[str],
    input_type: EmbedInputType,
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Embed text inputs with explicit input type specification.

This is the low-level embedding method. For most use cases, prefer embed_query() or embed_documents().

Returns

EmbeddingResult — An EmbeddingResult containing the embeddings EmbeddingResult — and metadata about the operation.

Parameters

inputs : str | Sequence[str]

A single string or sequence of strings to embed.

input_type : EmbedInputType

The type of input, either 'query' or 'document'.

settings : EmbeddingSettings | None Default: None

Optional settings to override the embedder’s default settings for this call.

max_input_tokens

@async

def max_input_tokens() -> int | None

Get the maximum number of tokens the model can accept as input.

Returns

int | None — The maximum token count, or None if the limit is unknown for this model.

count_tokens

@async

def count_tokens(text: str) -> int

Count the number of tokens in the given text.

Returns

int — The number of tokens in the text.

Parameters

text : str

The text to tokenize and count.

Raises

NotImplementedError — If the model doesn’t support token counting.
UserError — If the model or tokenizer is not supported.

embed_query_sync

def embed_query_sync(
    query: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Synchronous version of embed_query().

Returns

EmbeddingResult

embed_documents_sync

def embed_documents_sync(
    documents: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Synchronous version of embed_documents().

Returns

EmbeddingResult

embed_sync

def embed_sync(
    inputs: str | Sequence[str],
    input_type: EmbedInputType,
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Synchronous version of embed().

Returns

EmbeddingResult

max_input_tokens_sync

def max_input_tokens_sync() -> int | None

Synchronous version of max_input_tokens().

Returns

int | None

count_tokens_sync

def count_tokens_sync(text: str) -> int

Synchronous version of count_tokens().

Returns

int

instrument_embedding_model

def instrument_embedding_model(
    model: EmbeddingModel,
    instrument: InstrumentationSettings | bool,
) -> EmbeddingModel

Instrument an embedding model with OpenTelemetry/logfire.

Returns

EmbeddingModel

merge_embedding_settings

def merge_embedding_settings(
    base: EmbeddingSettings | None,
    overrides: EmbeddingSettings | None,
) -> EmbeddingSettings | None

Merge two sets of embedding settings, with overrides taking precedence.

Returns

EmbeddingSettings | None — Merged settings, or None if both inputs are None.

Parameters

base : EmbeddingSettings | None

Base settings (typically from the embedder or model).

overrides : EmbeddingSettings | None

Settings that should override the base (typically per-call settings).

infer_embedding_model

def infer_embedding_model(
    model: EmbeddingModel | KnownEmbeddingModelName | str,
    provider_factory: Callable[[str], Provider[Any]] = infer_provider,
) -> EmbeddingModel

Infer the model from the name.

Returns

EmbeddingModel

KnownEmbeddingModelName

Known model names that can be used with the model parameter of Embedder.

KnownEmbeddingModelName is provided as a concise way to specify an embedding model.

Default: TypeAliasType('KnownEmbeddingModelName', Literal['google-gla:gemini-embedding-001', 'google-gla:gemini-embedding-2-preview', 'google-vertex:gemini-embedding-001', 'google-vertex:gemini-embedding-2-preview', 'google-vertex:text-embedding-005', 'google-vertex:text-multilingual-embedding-002', 'openai:text-embedding-ada-002', 'openai:text-embedding-3-small', 'openai:text-embedding-3-large', 'cohere:embed-v4.0', 'cohere:embed-english-v3.0', 'cohere:embed-english-light-v3.0', 'cohere:embed-multilingual-v3.0', 'cohere:embed-multilingual-light-v3.0', 'voyageai:voyage-4-large', 'voyageai:voyage-4', 'voyageai:voyage-4-lite', 'voyageai:voyage-3-large', 'voyageai:voyage-3.5', 'voyageai:voyage-3.5-lite', 'voyageai:voyage-code-3', 'voyageai:voyage-finance-2', 'voyageai:voyage-law-2', 'voyageai:voyage-code-2', 'bedrock:amazon.titan-embed-text-v1', 'bedrock:amazon.titan-embed-text-v2:0', 'bedrock:cohere.embed-english-v3', 'bedrock:cohere.embed-multilingual-v3', 'bedrock:cohere.embed-v4:0', 'bedrock:amazon.nova-2-multimodal-embeddings-v1:0'])

EmbeddingModel

Bases: ABC

Abstract base class for embedding models.

Implement this class to create a custom embedding model. For most use cases, use one of the built-in implementations:

Attributes

settings

Get the default settings for this model.

Type: EmbeddingSettings | None

base_url

The base URL for the provider API, if available.

Type: str | None

model_name

The name of the embedding model.

Type: str

system

The embedding model provider/system identifier (e.g., ‘openai’, ‘cohere’).

Type: str

Methods

init

def __init__(settings: EmbeddingSettings | None = None) -> None

Initialize the model with optional settings.

Returns

None

Parameters

settings : EmbeddingSettings | None Default: None

Model-specific settings that will be used as defaults for this model.

embed

@abstractmethod

@async

def embed(
    inputs: str | Sequence[str],
    input_type: EmbedInputType,
    settings: EmbeddingSettings | None = None,
) -> EmbeddingResult

Generate embeddings for the given inputs.

Returns

EmbeddingResult — An EmbeddingResult containing EmbeddingResult — the embeddings and metadata.

Parameters

inputs : str | Sequence[str]

A single string or sequence of strings to embed.

input_type : EmbedInputType

Whether the inputs are queries or documents.

settings : EmbeddingSettings | None Default: None

Optional settings to override the model’s defaults.

prepare_embed

def prepare_embed(
    inputs: str | Sequence[str],
    settings: EmbeddingSettings | None = None,
) -> tuple[list[str], EmbeddingSettings]

Prepare the inputs and settings for embedding.

This method normalizes inputs to a list and merges settings. Subclasses should call this at the start of their embed() implementation.

Returns

tuple[list[str], EmbeddingSettings] — A tuple of (normalized inputs list, merged settings).

Parameters

inputs : str | Sequence[str]

A single string or sequence of strings.

settings : EmbeddingSettings | None Default: None

Optional settings to merge with defaults.

max_input_tokens

@async

def max_input_tokens() -> int | None

Get the maximum number of tokens that can be input to the model.

Returns

int | None — The maximum token count, or None if unknown.

count_tokens

@async

def count_tokens(text: str) -> int

Count the number of tokens in the given text.

Returns

int — The number of tokens.

Parameters

text : str

The text to tokenize and count.

Raises

NotImplementedError — If the model doesn’t support token counting.
UserError — If the model or tokenizer is not supported.

EmbeddingResult

The result of an embedding operation.

This class contains the generated embeddings along with metadata about the operation, including the original inputs, model information, usage statistics, and timing.

Example:

from pydantic_ai import Embedder

embedder = Embedder('openai:text-embedding-3-small')


async def main():
    result = await embedder.embed_query('What is AI?')

    # Access embeddings by index
    print(len(result.embeddings[0]))
    #> 1536

    # Access embeddings by original input text
    print(result['What is AI?'] == result.embeddings[0])
    #> True

    # Check usage
    print(f'Tokens used: {result.usage.input_tokens}')
    #> Tokens used: 3

Attributes

embeddings

The computed embedding vectors, one per input text.

Each embedding is a sequence of floats representing the text in vector space.

Type: Sequence[Sequence[float]]

inputs

The original input texts that were embedded.

Type: Sequence[str]

input_type

Whether the inputs were embedded as queries or documents.

Type: EmbedInputType

model_name

The name of the model that generated these embeddings.

Type: str

provider_name

The name of the provider (e.g., ‘openai’, ‘cohere’).

Type: str

timestamp

When the embedding request was made.

Type: datetime Default: field(default_factory=_now_utc)

usage

Token usage statistics for this request.

Type: RequestUsage Default: field(default_factory=RequestUsage)

provider_details

Provider-specific details from the response.

Type: dict[str, Any] | None Default: None

provider_response_id

Unique identifier for this response from the provider, if available.

Type: str | None Default: None

Methods

getitem

def __getitem__(item: int | str) -> Sequence[float]

Get the embedding for an input by index or by the original input text.

Returns

Sequence[float] — The embedding vector for the specified input.

Parameters

item : int | str

Either an integer index or the original input string.

Raises

IndexError — If the index is out of range.
ValueError — If the string is not found in the inputs.

cost

def cost() -> genai_types.PriceCalculation

Calculate the cost of the embedding request.

Uses genai-prices for pricing data.

Returns

genai_types.PriceCalculation — A price calculation object with total_price, input_price, and other cost details.

Raises

LookupError — If pricing data is not available for this model/provider.

EmbedInputType

The type of input to the embedding model.

'query': Text that will be used as a search query
'document': Text that will be stored and searched against

Some embedding models optimize differently for queries vs documents.

Default: Literal['query', 'document']

EmbeddingSettings

Bases: TypedDict

Common settings for configuring embedding models.

These settings apply across multiple embedding model providers. Not all settings are supported by all models - check the specific model’s documentation for details.

Provider-specific settings classes (e.g., OpenAIEmbeddingSettings, CohereEmbeddingSettings) extend this with additional provider-prefixed options.

Attributes

dimensions

The number of dimensions for the output embeddings.

Supported by:

OpenAI
Cohere
Google
Sentence Transformers
Bedrock
VoyageAI

Type: int

truncate

Whether to truncate inputs that exceed the model’s context length.

Defaults to False. If True, inputs that are too long will be truncated. If False, an error will be raised for inputs that exceed the context length.

For more control over truncation, you can use max_input_tokens() and count_tokens() to implement your own truncation logic.

Provider-specific truncation settings (e.g., cohere_truncate, bedrock_cohere_truncate) take precedence if specified.

Supported by:

Cohere
Bedrock (Cohere and Nova models)
VoyageAI

Type: bool

extra_headers

Extra headers to send to the model.

Supported by:

OpenAI
Cohere

Type: dict[str, str]

extra_body

Extra body to send to the model.

Supported by:

OpenAI
Cohere

Type: object

merge_embedding_settings

def merge_embedding_settings(
    base: EmbeddingSettings | None,
    overrides: EmbeddingSettings | None,
) -> EmbeddingSettings | None

Merge two sets of embedding settings, with overrides taking precedence.

Returns

EmbeddingSettings | None — Merged settings, or None if both inputs are None.

Parameters

base : EmbeddingSettings | None

Base settings (typically from the embedder or model).

overrides : EmbeddingSettings | None

Settings that should override the base (typically per-call settings).

OpenAIEmbeddingSettings

Bases: EmbeddingSettings

Settings used for an OpenAI embedding model request.

All fields from EmbeddingSettings are supported.

OpenAIEmbeddingModel

Bases: EmbeddingModel

OpenAI embedding model implementation.

This model works with OpenAI’s embeddings API and any OpenAI-compatible providers.

Example:

from pydantic_ai.embeddings.openai import OpenAIEmbeddingModel
from pydantic_ai.providers.openai import OpenAIProvider

# Using OpenAI directly
model = OpenAIEmbeddingModel('text-embedding-3-small')

# Using an OpenAI-compatible provider
model = OpenAIEmbeddingModel(
    'text-embedding-3-small',
    provider=OpenAIProvider(base_url='https://my-provider.com/v1'),
)

Attributes

model_name

The embedding model name.

Type: OpenAIEmbeddingModelName

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: OpenAIEmbeddingModelName,
    provider: OpenAIEmbeddingsCompatibleProvider | Literal['openai'] | Provider[AsyncOpenAI] = 'openai',
    settings: EmbeddingSettings | None = None,
)

Initialize an OpenAI embedding model.

Parameters

model_name : OpenAIEmbeddingModelName

The name of the OpenAI model to use. See OpenAI’s embedding models for available options.

provider : OpenAIEmbeddingsCompatibleProvider | Literal[‘openai’] | Provider[AsyncOpenAI] Default: 'openai'

The provider to use for authentication and API access. Can be:

'openai' (default): Uses the standard OpenAI API
A provider name string (e.g., 'azure', 'deepseek')
A Provider instance for custom configuration

See OpenAI-compatible providers for a list of supported providers.

settings : EmbeddingSettings | None Default: None

Model-specific EmbeddingSettings to use as defaults for this model.

OpenAIEmbeddingModelName

Possible OpenAI embeddings model names.

See the OpenAI embeddings documentation for available models.

Default: str | LatestOpenAIEmbeddingModelNames

CohereEmbeddingSettings

Bases: EmbeddingSettings

Settings used for a Cohere embedding model request.

All fields from EmbeddingSettings are supported, plus Cohere-specific settings prefixed with cohere_.

Attributes

cohere_max_tokens

The maximum number of tokens to embed.

Type: int

cohere_input_type

The Cohere-specific input type for the embedding.

Overrides the standard input_type argument. Options include: 'search_query', 'search_document', 'classification', 'clustering', and 'image'.

Type: CohereEmbedInputType

cohere_truncate

The truncation strategy to use:

'NONE' (default): Raise an error if input exceeds max tokens.
'END': Truncate the end of the input text.
'START': Truncate the start of the input text.

Note: This setting overrides the standard truncate boolean setting when specified.

Type: V2EmbedRequestTruncate

CohereEmbeddingModel

Bases: EmbeddingModel

Cohere embedding model implementation.

This model works with Cohere’s embeddings API, which offers multilingual support and various model sizes.

Example:

from pydantic_ai.embeddings.cohere import CohereEmbeddingModel

model = CohereEmbeddingModel('embed-v4.0')

Attributes

base_url

The base URL for the provider API, if available.

Type: str

model_name

The embedding model name.

Type: CohereEmbeddingModelName

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: CohereEmbeddingModelName,
    provider: Literal['cohere'] | Provider[AsyncClientV2] = 'cohere',
    settings: EmbeddingSettings | None = None,
)

Initialize a Cohere embedding model.

Parameters

model_name : CohereEmbeddingModelName

The name of the Cohere model to use. See Cohere Embed documentation for available models.

provider : Literal[‘cohere’] | Provider[AsyncClientV2] Default: 'cohere'

The provider to use for authentication and API access. Can be:

'cohere' (default): Uses the standard Cohere API
A CohereProvider instance for custom configuration

settings : EmbeddingSettings | None Default: None

Model-specific EmbeddingSettings to use as defaults for this model.

LatestCohereEmbeddingModelNames

Latest Cohere embeddings models.

See the Cohere Embed documentation for available models and their capabilities.

Default: Literal['embed-v4.0', 'embed-english-v3.0', 'embed-english-light-v3.0', 'embed-multilingual-v3.0', 'embed-multilingual-light-v3.0']

CohereEmbeddingModelName

Possible Cohere embeddings model names.

Default: str | LatestCohereEmbeddingModelNames

GoogleEmbeddingSettings

Bases: EmbeddingSettings

Settings used for a Google embedding model request.

All fields from EmbeddingSettings are supported, plus Google-specific settings prefixed with google_.

Attributes

google_task_type

The task type for the embedding.

Overrides the automatic task type selection based on input_type. See Google’s task type documentation for available options.

Type: str

google_title

Optional title for the content being embedded.

Only applicable when task_type is RETRIEVAL_DOCUMENT.

Type: str

GoogleEmbeddingModel

Bases: EmbeddingModel

Google embedding model implementation.

This model works with Google’s embeddings API via the google-genai SDK, supporting both the Gemini API (Google AI Studio) and Vertex AI.

Example:

from pydantic_ai.embeddings.google import GoogleEmbeddingModel
from pydantic_ai.providers.google import GoogleProvider

# Using Gemini API (requires GOOGLE_API_KEY env var)
model = GoogleEmbeddingModel('gemini-embedding-001')

# Using Vertex AI
model = GoogleEmbeddingModel(
    'gemini-embedding-001',
    provider=GoogleProvider(vertexai=True, project='my-project', location='us-central1'),
)

Attributes

model_name

The embedding model name.

Type: GoogleEmbeddingModelName

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: GoogleEmbeddingModelName,
    provider: Literal['google-gla', 'google-vertex'] | Provider[Client] = 'google-gla',
    settings: EmbeddingSettings | None = None,
)

Initialize a Google embedding model.

Parameters

model_name : GoogleEmbeddingModelName

The name of the Google model to use. See Google Embeddings documentation for available models.

provider : Literal[‘google-gla’, ‘google-vertex’] | Provider[Client] Default: 'google-gla'

The provider to use for authentication and API access. Can be:

'google-gla' (default): Uses the Gemini API (Google AI Studio)
'google-vertex': Uses Vertex AI
A GoogleProvider instance for custom configuration

settings : EmbeddingSettings | None Default: None

Model-specific EmbeddingSettings to use as defaults for this model.

LatestGoogleGLAEmbeddingModelNames

Latest Google Gemini API (GLA) embedding models.

See the Google Embeddings documentation for available models and their capabilities.

Default: Literal['gemini-embedding-001', 'gemini-embedding-2-preview']

LatestGoogleVertexEmbeddingModelNames

Latest Google Vertex AI embedding models.

See the Vertex AI Embeddings documentation for available models and their capabilities.

Default: Literal['gemini-embedding-001', 'gemini-embedding-2-preview', 'text-embedding-005', 'text-multilingual-embedding-002']

LatestGoogleEmbeddingModelNames

All latest Google embedding models (union of GLA and Vertex AI models).

Default: LatestGoogleGLAEmbeddingModelNames | LatestGoogleVertexEmbeddingModelNames

GoogleEmbeddingModelName

Possible Google embeddings model names.

Default: str | LatestGoogleEmbeddingModelNames

BedrockEmbeddingSettings

Bases: EmbeddingSettings

Settings used for a Bedrock embedding model request.

All fields from EmbeddingSettings are supported, plus Bedrock-specific settings prefixed with bedrock_.

All settings are optional - if not specified, model defaults are used.

Note on dimensions parameter support:

Titan v1 (amazon.titan-embed-text-v1): Not supported (fixed: 1536)
Titan v2 (amazon.titan-embed-text-v2:0): Supported (default: 1024, accepts 256/384/1024)
Cohere v3 (cohere.embed-english-v3, cohere.embed-multilingual-v3): Not supported (fixed: 1024)
Cohere v4 (cohere.embed-v4:0): Supported (default: 1536, accepts 256/512/1024/1536)
Nova (amazon.nova-2-multimodal-embeddings-v1:0): Supported (default: 3072, accepts 256/384/1024/3072)

Unsupported settings are silently ignored.

Note on truncate parameter support:

Titan models (amazon.titan-embed-text-v1, amazon.titan-embed-text-v2:0): Not supported
Cohere models (all versions): Supported (default: False, maps to 'END' when True)
Nova (amazon.nova-2-multimodal-embeddings-v1:0): Supported (default: False, maps to 'END' when True)

For fine-grained truncation control, use model-specific settings: bedrock_cohere_truncate or bedrock_nova_truncate.

Attributes

bedrock_titan_normalize

Whether to normalize embedding vectors for Titan models.

Supported by: amazon.titan-embed-text-v2:0 (default: True)

Not supported by: amazon.titan-embed-text-v1 (silently ignored)

When enabled, vectors are normalized for direct cosine similarity calculations.

Type: bool

bedrock_cohere_max_tokens

The maximum number of tokens to embed for Cohere models.

Supported by: cohere.embed-v4:0 (default: 128000)

Not supported by: cohere.embed-english-v3, cohere.embed-multilingual-v3 (silently ignored)

Type: int

bedrock_cohere_input_type

The input type for Cohere models.

Supported by: All Cohere models (cohere.embed-english-v3, cohere.embed-multilingual-v3, cohere.embed-v4:0)

By default, embed_query() uses 'search_query' and embed_documents() uses 'search_document'. Also accepts 'classification' or 'clustering'.

Type: Literal[‘search_document’, ‘search_query’, ‘classification’, ‘clustering’]

bedrock_cohere_truncate

The truncation strategy for Cohere models. Overrides base truncate setting.

Supported by: All Cohere models (cohere.embed-english-v3, cohere.embed-multilingual-v3, cohere.embed-v4:0)

Default: 'NONE'

'NONE': Raise an error if input exceeds max tokens.
'START': Truncate the start of the input.
'END': Truncate the end of the input.

Type: Literal[‘NONE’, ‘START’, ‘END’]

bedrock_nova_truncate

The truncation strategy for Nova models. Overrides base truncate setting.

Supported by: amazon.nova-2-multimodal-embeddings-v1:0

Default: 'NONE'

'NONE': Raise an error if input exceeds max tokens.
'START': Truncate the start of the input.
'END': Truncate the end of the input.

Type: Literal[‘NONE’, ‘START’, ‘END’]

bedrock_nova_embedding_purpose

The embedding purpose for Nova models.

Supported by: amazon.nova-2-multimodal-embeddings-v1:0

By default, embed_query() uses 'GENERIC_RETRIEVAL' and embed_documents() uses 'GENERIC_INDEX'. Also accepts 'TEXT_RETRIEVAL', 'CLASSIFICATION', or 'CLUSTERING'.

Note: Multimodal-specific purposes ('IMAGE_RETRIEVAL', 'VIDEO_RETRIEVAL', 'DOCUMENT_RETRIEVAL', 'AUDIO_RETRIEVAL') are not supported as this embedding client only accepts text input.

Type: Literal[‘GENERIC_INDEX’, ‘GENERIC_RETRIEVAL’, ‘TEXT_RETRIEVAL’, ‘CLASSIFICATION’, ‘CLUSTERING’]

bedrock_inference_profile

An inference profile ARN to use as the modelId in API requests.

When set, this value is used as the modelId in invoke_model API calls instead of the base model_name. This allows you to pass the base model name (e.g. 'amazon.titan-embed-text-v2:0') as model_name for detecting model capabilities, while routing requests through an inference profile for cost tracking or cross-region inference.

Type: str

bedrock_max_concurrency

Maximum number of concurrent requests for models that don’t support batch embedding.

Applies to: amazon.titan-embed-text-v1, amazon.titan-embed-text-v2:0, amazon.nova-2-multimodal-embeddings-v1:0

When embedding multiple texts with models that only support single-text requests, this controls how many requests run in parallel. Defaults to 5.

Type: int

BedrockEmbeddingModel

Bases: EmbeddingModel

Bedrock embedding model implementation.

This model works with AWS Bedrock’s embedding models including Amazon Titan Embeddings and Cohere Embed models.

Example:

from pydantic_ai.embeddings.bedrock import BedrockEmbeddingModel
from pydantic_ai.providers.bedrock import BedrockProvider

# Using default AWS credentials
model = BedrockEmbeddingModel('amazon.titan-embed-text-v2:0')

# Using explicit credentials
model = BedrockEmbeddingModel(
    'cohere.embed-english-v3',
    provider=BedrockProvider(
        region_name='us-east-1',
        aws_access_key_id='...',
        aws_secret_access_key='...',
    ),
)

Attributes

base_url

The base URL for the provider API.

Type: str

model_name

The embedding model name.

Type: BedrockEmbeddingModelName

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: BedrockEmbeddingModelName,
    provider: Literal['bedrock'] | Provider[BaseClient] = 'bedrock',
    settings: EmbeddingSettings | None = None,
)

Initialize a Bedrock embedding model.

Parameters

model_name : BedrockEmbeddingModelName

The name of the Bedrock embedding model to use. See Bedrock embedding models for available options.

provider : Literal[‘bedrock’] | Provider[BaseClient] Default: 'bedrock'

The provider to use for authentication and API access. Can be:

'bedrock' (default): Uses default AWS credentials
A BedrockProvider instance for custom configuration

settings : EmbeddingSettings | None Default: None

Model-specific EmbeddingSettings to use as defaults for this model.

max_input_tokens

@async

def max_input_tokens() -> int | None

Get the maximum number of tokens that can be input to the model.

Returns

int | None

LatestBedrockEmbeddingModelNames

Latest Bedrock embedding model names.

See the Bedrock docs for available embedding models.

Default: Literal['amazon.titan-embed-text-v1', 'amazon.titan-embed-text-v2:0', 'cohere.embed-english-v3', 'cohere.embed-multilingual-v3', 'cohere.embed-v4:0', 'amazon.nova-2-multimodal-embeddings-v1:0']

BedrockEmbeddingModelName

Possible Bedrock embedding model names.

Default: str | LatestBedrockEmbeddingModelNames

VoyageAIEmbeddingSettings

Bases: EmbeddingSettings

Settings used for a VoyageAI embedding model request.

All fields from EmbeddingSettings are supported, plus VoyageAI-specific settings prefixed with voyageai_.

Attributes

voyageai_input_type

The VoyageAI-specific input type for the embedding.

Overrides the standard input_type argument. Options include: 'query', 'document', or 'none' for direct embedding without prefix.

Type: VoyageAIEmbedInputType

VoyageAIEmbeddingModel

Bases: EmbeddingModel

VoyageAI embedding model implementation.

VoyageAI provides state-of-the-art embedding models optimized for retrieval, with specialized models for code, finance, and legal domains.

Example:

from pydantic_ai.embeddings.voyageai import VoyageAIEmbeddingModel

model = VoyageAIEmbeddingModel('voyage-3.5')

Attributes

base_url

The base URL for the provider API.

Type: str

model_name

The embedding model name.

Type: VoyageAIEmbeddingModelName

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: VoyageAIEmbeddingModelName,
    provider: Literal['voyageai'] | Provider[AsyncClient] = 'voyageai',
    settings: EmbeddingSettings | None = None,
)

Initialize a VoyageAI embedding model.

Parameters

model_name : VoyageAIEmbeddingModelName

The name of the VoyageAI model to use. See VoyageAI models for available options.

provider : Literal[‘voyageai’] | Provider[AsyncClient] Default: 'voyageai'

The provider to use for authentication and API access. Can be:

'voyageai' (default): Uses the standard VoyageAI API
A VoyageAIProvider instance for custom configuration

settings : EmbeddingSettings | None Default: None

Model-specific EmbeddingSettings to use as defaults for this model.

LatestVoyageAIEmbeddingModelNames

Latest VoyageAI embedding models.

See VoyageAI Embeddings for available models and their capabilities.

Default: Literal['voyage-4-large', 'voyage-4', 'voyage-4-lite', 'voyage-3-large', 'voyage-3.5', 'voyage-3.5-lite', 'voyage-code-3', 'voyage-finance-2', 'voyage-law-2', 'voyage-code-2']

VoyageAIEmbeddingModelName

Possible VoyageAI embedding model names.

Default: str | LatestVoyageAIEmbeddingModelNames

VoyageAIEmbedInputType

VoyageAI embedding input types.

'query': For search queries; prepends retrieval-optimized prefix.
'document': For documents; prepends document retrieval prefix.
'none': Direct embedding without any prefix.

Default: Literal['query', 'document', 'none']

SentenceTransformersEmbeddingSettings

Bases: EmbeddingSettings

Settings used for a Sentence-Transformers embedding model request.

All fields from EmbeddingSettings are supported, plus Sentence-Transformers-specific settings prefixed with sentence_transformers_.

Attributes

sentence_transformers_device

Device to run inference on.

Examples: 'cpu', 'cuda', 'cuda:0', 'mps' (Apple Silicon).

Type: str

sentence_transformers_normalize_embeddings

Whether to L2-normalize embeddings.

When True, all embeddings will have unit length, which is useful for cosine similarity calculations.

Type: bool

sentence_transformers_batch_size

Batch size to use during encoding.

Larger batches may be faster but require more memory.

Type: int

SentenceTransformerEmbeddingModel

Bases: EmbeddingModel

Local embedding model using the sentence-transformers library.

This model runs embeddings locally on your machine, which is useful for:

Privacy-sensitive applications where data shouldn’t leave your infrastructure
Reducing API costs for high-volume embedding workloads
Offline or air-gapped environments

Models are downloaded from Hugging Face on first use. See the Sentence-Transformers documentation for available models.

Example:

from sentence_transformers import SentenceTransformer

from pydantic_ai.embeddings.sentence_transformers import (
    SentenceTransformerEmbeddingModel,
)

# Using a model name (downloads from Hugging Face)
model = SentenceTransformerEmbeddingModel('all-MiniLM-L6-v2')

# Using an existing SentenceTransformer instance
st_model = SentenceTransformer('all-MiniLM-L6-v2')
model = SentenceTransformerEmbeddingModel(st_model)

Attributes

base_url

No base URL — runs locally.

Type: str | None

model_name

The embedding model name.

Type: str

system

The embedding model provider/system identifier.

Type: str

Methods

init

def __init__(
    model: SentenceTransformer | str,
    settings: EmbeddingSettings | None = None,
) -> None

Initialize a Sentence-Transformers embedding model.

Returns

None

Parameters

model : SentenceTransformer | str

The model to use. Can be:

A model name from Hugging Face (e.g., 'all-MiniLM-L6-v2')
A local path to a saved model
An existing SentenceTransformer instance

settings : EmbeddingSettings | None Default: None

Model-specific SentenceTransformersEmbeddingSettings to use as defaults for this model.

TestEmbeddingModel

Bases: EmbeddingModel

A mock embedding model for testing.

This model returns deterministic embeddings (all 1.0 values) and tracks the settings used in the last call via the last_settings attribute.

Example:

from pydantic_ai import Embedder
from pydantic_ai.embeddings import TestEmbeddingModel

test_model = TestEmbeddingModel()
embedder = Embedder('openai:text-embedding-3-small')


async def main():
    with embedder.override(model=test_model):
        await embedder.embed_query('test')
        assert test_model.last_settings is not None

Attributes

last_settings

The settings used in the most recent embed call.

Type: EmbeddingSettings | None Default: None

model_name

The embedding model name.

Type: str

system

The embedding model provider.

Type: str

Methods

init

def __init__(
    model_name: str = 'test',
    provider_name: str = 'test',
    dimensions: int = 8,
    settings: EmbeddingSettings | None = None,
)

Initialize the test embedding model.

Parameters

model_name : str Default: 'test'

The model name to report in results.

provider_name : str Default: 'test'

The provider name to report in results.

dimensions : int Default: 8

The number of dimensions for the generated embeddings.

settings : EmbeddingSettings | None Default: None

Optional default settings for the model.

WrapperEmbeddingModel

Bases: EmbeddingModel

Base class for embedding models that wrap another model.

Use this as a base class to create custom embedding model wrappers that modify behavior (e.g., caching, logging, rate limiting) while delegating to an underlying model.

By default, all methods are passed through to the wrapped model. Override specific methods to customize behavior.

Attributes

wrapped

The underlying embedding model being wrapped.

Type: EmbeddingModel Default: infer_embedding_model(wrapped) if isinstance(wrapped, str) else wrapped

settings

Get the settings from the wrapped embedding model.

Type: EmbeddingSettings | None

Methods

init

def __init__(wrapped: EmbeddingModel | str)

Initialize the wrapper with an embedding model.

Parameters

wrapped : EmbeddingModel | str

The model to wrap. Can be an EmbeddingModel instance or a model name string (e.g., 'openai:text-embedding-3-small').

InstrumentedEmbeddingModel

Bases: WrapperEmbeddingModel

Embedding model which wraps another model so that requests are instrumented with OpenTelemetry.

See the Debugging and Monitoring guide for more info.

Attributes

instrumentation_settings

Instrumentation settings for this model.

Type: InstrumentationSettings Default: options or InstrumentationSettings()

instrument_embedding_model

def instrument_embedding_model(
    model: EmbeddingModel,
    instrument: InstrumentationSettings | bool,
) -> EmbeddingModel

Instrument an embedding model with OpenTelemetry/logfire.

Returns

EmbeddingModel