pydantic_evals.generation
Utilities for generating example datasets for pydantic_evals.
This module provides functions for generating sample datasets for testing and examples, using LLMs to create realistic test data with proper structure.
@async
def generate_dataset(
dataset_type: type[Dataset[InputsT, OutputT, MetadataT]],
path: Path | str | None = None,
custom_evaluator_types: Sequence[type[Evaluator[InputsT, OutputT, MetadataT]]] = (),
model: models.Model | models.KnownModelName = 'openai:gpt-5.2',
n_examples: int = 3,
extra_instructions: str | None = None,
) -> Dataset[InputsT, OutputT, MetadataT]
Use an LLM to generate a dataset of test cases, each consisting of input, expected output, and metadata.
This function creates a properly structured dataset with the specified input, output, and metadata types. It uses an LLM to attempt to generate realistic test cases that conform to the types’ schemas.
Dataset[InputsT, OutputT, MetadataT] — A properly structured Dataset object with generated test cases.
Optional path to save the generated dataset. If provided, the dataset will be saved to this location.
dataset_type : type[Dataset[InputsT, OutputT, MetadataT]]
The type of dataset to generate, with the desired input, output, and metadata types.
Optional sequence of custom evaluator classes to include in the schema.
model : models.Model | models.KnownModelName Default: 'openai:gpt-5.2'
The Pydantic AI model to use for generation. Defaults to ‘openai:gpt-5.2’.
n_examples : int Default: 3
Number of examples to generate. Defaults to 3.
Optional additional instructions to provide to the LLM.
ValidationError— If the LLM’s response cannot be parsed as a valid dataset.
Generic type for the inputs to the task being evaluated.
Default: TypeVar('InputsT', default=Any)
Generic type for the expected output of the task being evaluated.
Default: TypeVar('OutputT', default=Any)
Generic type for the metadata associated with the task being evaluated.
Default: TypeVar('MetadataT', default=Any)