Skip to content
You're viewing docs for v1.10. See the latest version →

Pydantic-v2

Pydantic V2 Plan


Updated late 10 Jul 2022, see pydantic#4226.

Update 30 Dec 2022: The new release deadline for Pydantic V2 is the end of Q1 2023, see pydantic#4887 for more details, further updates will be posted on that issue.


Iโ€™ve spoken to quite a few people about pydantic V2, and mention it in passing even more.

I owe people a proper explanation of the plan for V2:

  • What we will add
  • What we will remove
  • What we will change
  • How Iโ€™m intending to go about completing it and getting it released
  • Some idea of timeframe ๐Ÿ˜จ

Here goesโ€ฆ


Enormous thanks to Eric Jolibois, Laurence Watson, Sebastiรกn Ramรญrez, Adrian Garcia Badaracco, Tom Hamilton Stubber, Zac Hatfield-Dodds, Tom & Hasan Ramezani for reviewing this blog post, putting up with (and correcting) my horrible typos and making great suggestions that have made this post and Pydantic V2 materially better.


Plan & Timeframe

Iโ€™m currently taking a kind of sabbatical after leaving my last job to get pydantic V2 released. Why? I ask myself that question quite often. Iโ€™m very proud of how much pydantic is used, but Iโ€™m less proud of its internals. Since itโ€™s something people seem to care about and use quite a lot (26m downloads a month, used by 72k public repos, 10k stars). I want it to be as good as possible.

While Iโ€™m on the subject of why, how and my odd sabbatical: if you work for a large company who use pydantic a lot, you might encourage the company to sponsor me a meaningful amount, like Salesforce did (if your organisation is not open to donations, I can also offer consulting services). This is not charity, recruitment or marketing - the argument should be about how much the company will save if pydantic is 10x faster, more stable and more powerful - it would be worth paying me 10% of that to make it happen.

Before pydantic V2 can be released, we need to release pydantic V1.10 - there are lots of changes in the main branch of pydantic contributed by the community, itโ€™s only fair to provide a release including those changes, many of them will remain unchanged for V2, the rest will act as a requirement to make sure pydantic V2 includes the capabilities they implemented.

The basic road map for me is as follows:

  1. Implement a few more features in pydantic-core, and release a first version, see below
  2. Work on getting pydantic V1.10 out - basically merge all open PRs that are finished
  3. Release pydantic V1.10
  4. Delete all stale PRs which didnโ€™t make it into V1.10, apologise profusely to their authors who put their valuable time into pydantic only to have their PRs closed ๐Ÿ™ (and explain when and how they can rebase and recreate the PR)
  5. Rename master to main, seems like a good time to do this
  6. Change the main branch of pydantic to target V2
  7. Start tearing pydantic code apart and see how many existing tests can be made to pass
  8. Rinse, repeat
  9. Release pydantic V2 ๐ŸŽ‰

Plan is to have all this done by the end of October, definitely by the end of the year.

Breaking Changes & Compatibility ๐Ÿ™

While weโ€™ll do our best to avoid breaking changes, some things will break.

As per the greatest pun in modern TV history.

You canโ€™t make a Tomelette without breaking some Greggs.

Where possible, if breaking changes are unavoidable, weโ€™ll try to provide warnings or errors to make sure those changes are obvious to developers.

Motivation & pydantic-core

Since pydanticโ€™s initial release, with the help of wonderful contributors Eric Jolibois, Sebastiรกn Ramรญrez, David Montague and many others, the package and its usage have grown enormously. The core logic however has remained mostly unchanged since the initial experiment. Itโ€™s old, it smells, it needs to be rebuilt.

The release of version 2 is an opportunity to rebuild pydantic and correct many things that donโ€™t make sense - to make pydantic amazing ๐Ÿš€.

The core validation logic of pydantic V2 will be performed by a separate package pydantic-core which Iโ€™ve been building over the last few months. pydantic-core is written in Rust using the excellent pyo3 library which provides rust bindings for python.

The motivation for building pydantic-core in Rust is as follows:

  1. Performance, see below
  2. Recursion and code separation - with no stack and little-to-no overhead for extra function calls, Rust allows pydantic-core to be implemented as a tree of small validators which call each other, making code easier to understand and extend without harming performance
  3. Safety and complexity - pydantic-core is a fairly complex piece of code which has to draw distinctions between many different errors, Rust is great in situations like this, it should minimise bugs (:fingers_crossed:) and allow the codebase to be extended for a long time to come

pydantic-core is usable now, albeit with an unintuitive API, if youโ€™re interested, please give it a try.

pydantic-core provides validators for common data types, see a list here. Other, less commonly used data types will be supported via validator functions implemented in pydantic, in Python.

See pydantic-core#153 for a summary of what needs to be completed before its first release.

Headlines

Here are some of the biggest changes expected in V2.

Performance ๐Ÿ‘

As a result of the move to Rust for the validation logic (and significant improvements in how validation objects are structured) pydantic V2 will be significantly faster than pydantic V1.

Looking at the pydantic-core benchmarks today, pydantic V2 is between 4x and 50x faster than pydantic V1.9.1.

In general, pydantic V2 is about 17x faster than V1 when validating a model containing a range of common fields.

Strict Mode ๐Ÿ‘

People have long complained about pydantic for coercing data instead of throwing an error. E.g. input to an int field could be 123 or the string "123" which would be converted to 123 While this is very useful in many scenarios (think: URL parameters, environment variables, user input), there are some situations where itโ€™s not desirable.

pydantic-core comes with โ€œstrict modeโ€ built in. With this, only the exact data type is allowed, e.g. passing "123" to an int field would result in a validation error.

This will allow pydantic V2 to offer a strict switch which can be set on either a model or a field.

Formalised Conversion Table ๐Ÿ‘

As well as complaints about coercion, another legitimate complaint was inconsistency around data conversion.

In pydantic V2, the following principle will govern when data should be converted in โ€œlax modeโ€ (strict=False):

If the input data has a SINGLE and INTUITIVE representation, in the fieldโ€™s type, AND no data is lost during the conversion, then the data will be converted; otherwise a validation error is raised. There is one exception to this rule: string fields - virtually all data has an intuitive representation as a string (e.g. repr() and str()), therefore a custom rule is required: only str, bytes and bytearray are valid as inputs to string fields.

Some examples of what that means in practice:

Field TypeInputSingle & Intuitive R.All Data PreservedResult
int"123"Convert
int123.0Convert
int123.1Error
date"2020-01-01"Convert
date"2020-01-01T00:00:00"Convert
date"2020-01-01T12:00:00"Error
intb"1"Error

(For the last case converting bytes to an int could reasonably mean int(bytes_data.decode()) or int.from_bytes(b'1', 'big/little'), hence an error)

In addition to the general rule, weโ€™ll provide a conversion table which defines exactly what data will be allowed to which field types. See the table below for a start on this.

Built in JSON support ๐Ÿ‘

pydantic-core can parse JSON directly into a model or output type, this both improves performance and avoids issue with strictness - e.g. if you have a strict model with a datetime field, the input must be a datetime object, but clearly that makes no sense when parsing JSON which has no datatime type. Same with bytes and many other types.

Pydantic V2 will therefore allow some conversion when validating JSON directly, even in strict mode (e.g. ISO8601 string -> datetime, str -> bytes) even though this would not be allowed when validating a python object.

In future direct validation of JSON will also allow:

  • parsing in a separate thread while starting validation in the main thread
  • line numbers from JSON to be included in the validation errors

(These features will not be included in V2, but instead will hopefully be added later.)

Validation without a Model ๐Ÿ‘

In pydantic V1 the core of all validation was a pydantic model, this led to a significant performance penalty and extra complexity when the output data type was not a model.

pydantic-core operates on a tree of validators with no โ€œmodelโ€ type required at the base of that tree. It can therefore validate a single string or datetime value, a TypedDict or a Model equally easily.

This feature will provide significant addition performance improvements in scenarios like:

  • Adding validation to dataclasses
  • Validating URL arguments, query strings, headers, etc. in FastAPI
  • Adding validation to TypedDict
  • Function argument validation
  • Adding validation to your custom classes, decoratorsโ€ฆ

In effect - anywhere where you donโ€™t care about a traditional model class instance.

Weโ€™ll need to add standalone methods for generating JSON Schema and dumping these objects to JSON, etc.

Required vs. Nullable Cleanup ๐Ÿ‘

Pydantic previously had a somewhat confused idea about โ€œrequiredโ€ vs. โ€œnullableโ€. This mostly resulted from my misgivings about marking a field as Optional[int] but requiring a value to be provided but allowing it to be None - I didnโ€™t like using the word โ€œoptionalโ€ in relation to a field which was not optional.

In pydantic V2, pydantic will move to match dataclasses, thus:

Required vs. Nullable
from pydantic import BaseModel

class Foo(BaseModel):
    f1: str  # required, cannot be None
    f2: str | None  # required, can be None - same as Optional[str] / Union[str, None]
    f3: str | None = None  # not required, can be None
    f4: str = 'Foobar'  # not required, but cannot be None

Validator Function Improvements ๐Ÿ‘ ๐Ÿ‘ ๐Ÿ‘

This is one of the changes in pydantic V2 that Iโ€™m most excited about, Iโ€™ve been talking about something like this for a long time, see pydantic#1984, but couldnโ€™t find a way to do this until now.

Fields which use a function for validation can be any of the following types:

  • function before mode - where the function is called before the inner validator is called
  • function after mode - where the function is called after the inner validator is called
  • plain mode - where thereโ€™s no inner validator
  • wrap mode - where the function takes a reference to a function which calls the inner validator, and can therefore modify the input before inner validation, modify the output after inner validation, conditionally not call the inner validator or catch errors from the inner validator and return a default value, or change the error

An example how a wrap validator might look:

Wrap mode validator function
from datetime import datetime
from pydantic import BaseModel, ValidationError, validator

class MyModel(BaseModel):
    timestamp: datetime

    @validator('timestamp', mode='wrap')
    def validate_timestamp(cls, v, handler):
        if v == 'now':
            # we don't want to bother with further validation, 
            # just return the new value
            return datetime.now()
        try:
            return handler(v)
        except ValidationError:
            # validation failed, in this case we want to 
            # return a default value
            return datetime(2000, 1, 1)

As well as being powerful, this provides a great โ€œescape hatchโ€ when pydantic validation doesnโ€™t do what you need.

More powerful alias(es) ๐Ÿ‘

pydantic-core can support alias โ€œpathsโ€ as well as simple string aliases to flatten data as itโ€™s validated.

Best demonstrated with an example:

Alias paths
from pydantic import BaseModel, Field


class Foo(BaseModel):
    bar: str = Field(aliases=[['baz', 2, 'qux']])


data = {
    'baz': [
        {'qux': 'a'},
        {'qux': 'b'},
        {'qux': 'c'},
        {'qux': 'd'},
    ]
}

foo = Foo(**data)
assert foo.bar == 'c'

aliases is a list of lists because multiple paths can be provided, if so theyโ€™re tried in turn until a value is found.

Tagged unions will use the same logic as aliases meaning nested attributes can be used to select a schema to validate against.

Improvements to Dumping/Serialization/Export ๐Ÿ‘ ๐Ÿ˜•

(I havenโ€™t worked on this yet, so these ideas are only provisional)

There has long been a debate about how to handle converting data when extracting it from a model. One of the features people have long requested is the ability to convert data to JSON compliant types while converting a model to a dict.

My plan is to move data export into pydantic-core, with that, one implementation can support all export modes without compromising (and hopefully significantly improving) performance.

I see four different export/serialisation scenarios:

  1. Extracting the field values of a model with no conversion, effectively model.__dict__ but with the current filtering logic provided by .dict()
  2. Extracting the field values of a model recursively (effectively what .dict() does now) - sub-models are converted to dicts, but other fields remain unchanged.
  3. Extracting data and converting at the same time (e.g. to JSON compliant types)
  4. Serialising data straight to JSON

I think all 4 modes can be supported in a single implementation, with a kind of โ€œ3.5โ€ mode where a python function is used to convert the data as the user wishes.

The current include and exclude logic is extremely complicated, but hopefully it wonโ€™t be too hard to translate it to Rust.

We should also add support for validate_alias and dump_alias as well as the standard alias to allow for customising field keys.

Validation Context ๐Ÿ‘

Pydantic V2 will add a new optional context argument to model_validate and model_validate_json which will allow you to pass information not available when creating a model to validators. See pydantic#1549 for motivation.

Hereโ€™s an example of context might be used:

Context during Validation
from pydantic import BaseModel, EmailStr, validator

class User(BaseModel):
    email: EmailStr
    home_country: str
    
    @validator('home_country')
    def check_home_country(cls, v, context):
        if v not in context['countries']:
            raise ValueError('invalid country choice')
        return v

async def add_user(post_data: bytes):
    countries = set(await db_connection.fetch_all('select code from country'))
    user = User.model_validate_json(post_data, context={'countries': countries})
    ...

Model Namespace Cleanup ๐Ÿ‘

For years Iโ€™ve wanted to clean up the model namespace, see pydantic#1001. This would avoid confusing gotchas when field names clash with methods on a model, it would also make it safer to add more methods to a model without risking new clashes.

After much deliberation (and even giving a lightning talk at the python language submit about alternatives, see this discussion). Iโ€™ve decided to go with the simplest and clearest approach, at the expense of a bit more typing:

All methods on models will start with model_, fieldsโ€™ names will not be allowed to start with "model" (aliases can be used if required).

This will mean BaseModel will have roughly the following signature.

New BaseModel methods
class BaseModel:
    model_fields: List[FieldInfo]
    """previously `__fields__`, although the format will change a lot"""
    @classmethod
    def model_validate(cls, data: Any, *, context=None) -> Self:  # (1)
        """
        previously `parse_obj()`, validate data
        """
    @classmethod
    def model_validate_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> Self:
        """
        previously `parse_raw(..., content_type='application/json')`
        validate data from JSON
        """
    @classmethod
    def model_is_instance(cls, data: Any, *, context=None) -> bool: # (2)
        """
        new, check if data is value for the model
        """
    @classmethod
    def model_is_instance_json(
        cls,
        data: str | bytes | bytearray,
        *,
        context=None
    ) -> bool:
        """
        Same as `model_is_instance`, but from JSON
        """
    def model_dump(
        self,
        include: ... = None,
        exclude: ... = None,
        by_alias: bool = False,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
        converter: Callable[[Any], Any] | None = None
    ) -> Any:
        """
        previously `dict()`, as before
        with new `mode` argument
        """
    def model_dump_json(self, ...) -> str:
        """
        previously `json()`, arguments as above
        effectively equivalent to `json.dump(self.model_dump(..., mode='json'))`,
        but more performant
        """
    def model_json_schema(self, ...) -> dict[str, Any]:
        """
        previously `schema()`, arguments roughly as before
        JSON schema as a dict
        """
    def model_update_forward_refs(self) -> None:
        """
        previously `update_forward_refs()`, update forward references
        """
    @classmethod
    def model_construct(
        self,
        _fields_set: set[str] | None = None,
        **values: Any
    ) -> Self:
        """
        previously `construct()`, arguments roughly as before
        construct a model with no validation
        """
    @classmethod
    def model_customize_schema(cls, schema: dict[str, Any]) -> dict[str, Any]:
        """
        new, way to customize validation,
        e.g. if you wanted to alter how the model validates certain types,
        or add validation for a specific type without custom types or
        decorated validators
        """
    class ModelConfig:
        """
        previously `Config`, configuration class for models
        """
  1. see Validation Context for more information on context
  2. see is_instance checks

The following methods will be removed:

  • .parse_file() - was a mistake, should never have been in pydantic
  • .parse_raw() - partially replaced by .model_validate_json(), the other functionality was a mistake
  • .from_orm() - the functionality has been moved to config, see other improvements below
  • .schema_json() - mostly since it causes confusion between pydantic validation schema and JSON schema, and can be replaced with just json.dumps(m.model_json_schema())
  • .copy() instead weโ€™ll implement __copy__ and let people use the copy module (this removes some functionality) from copy() but there are bugs and ambiguities with the functionality anyway

Strict API & API documentation ๐Ÿ‘

When preparing for pydantic V2, weโ€™ll make a strict distinction between the public API and private functions & classes. Private objects will be clearly identified as private via a _internal sub package to discourage use.

The public API will have API documentation. Iโ€™ve recently been working with the wonderful mkdocstrings package for both dirty-equals and watchfiles documentation. I intend to use mkdocstrings to generate complete API documentation for V2.

This wouldnโ€™t replace the current example-based somewhat informal documentation style but instead will augment it.

Error descriptions ๐Ÿ‘

The way line errors (the individual errors within a ValidationError) are built has become much more sophisticated in pydantic-core.

Thereโ€™s a well-defined set of error codes and messages.

More will be added when other types are validated via pure python validators in pydantic.

I would like to add a dedicated section to the documentation with extra information for each type of error.

This would be another key in a line error: documentation, which would link to the appropriate section in the docs.

Thus, errors might look like:

Line Errors Example
[
    {
        'kind': 'greater_than_equal',
        'loc': ['age'],
        'message': 'Value must be greater than or equal to 18',
        'input_value': 11,
        'context': {'ge': 18},
        'documentation': 'https://pydantic.dev/errors/#greater_than_equal',
    },
    {
        'kind': 'bool_parsing',
        'loc': ['is_developer'],
        'message': 'Value must be a valid boolean, unable to interpret input',
        'input_value': 'foobar',
        'documentation': 'https://pydantic.dev/errors/#bool_parsing',
    },
]

I own the pydantic.dev domain and will use it for at least these errors so that even if the docs URL changes, the error will still link to the correct documentation. If developers donโ€™t want to show these errors to users, they can always process the errors list and filter out items from each error they donโ€™t need or want.

No pure python implementation ๐Ÿ˜ฆ

Since pydantic-core is written in Rust, and I have absolutely no intention of rewriting it in python, pydantic V2 will only work where a binary package can be installed.

pydantic-core will provide binaries in PyPI for (at least):

  • Linux: x86_64, aarch64, i686, armv7l, musl-x86_64 & musl-aarch64
  • MacOS: x86_64 & arm64 (except python 3.7)
  • Windows: amd64 & win32
  • Web Assembly: wasm32 (pydantic-core is already compiled for wasm32 using emscripten and unit tests pass, except where cpython itself has problems)

Binaries for pypy are a work in progress and will be added if possible, see pydantic-core#154.

Other binaries can be added provided they can be (cross-)compiled on github actions. If no binary is available from PyPI, pydantic-core can be compiled from source if Rust stable is available.

The only place where I know this will cause problems is Raspberry Pi, which is a mess when it comes to packages written in Rust for Python. Effectively, until thatโ€™s fixed youโ€™ll likely have to install pydantic with pip install -i https://pypi.org/simple/ pydantic.

Pydantic becomes a pure python package ๐Ÿ‘

Pydantic V1.X is a pure python code base but is compiled with cython to provide some performance improvements. Since the โ€œhotโ€ code is moved to pydantic-core, pydantic itself can go back to being a pure python package.

This should significantly reduce the size of the pydantic package and make unit tests of pydantic much faster. In addition:

  • some constraints on pydantic code can be removed once it no-longer has to be compilable with cython
  • debugging will be easier as youโ€™ll be able to drop straight into the pydantic codebase as you can with other, pure python packages

Some pieces of edge logic could get a little slower as theyโ€™re no longer compiled.

is_instance like checks ๐Ÿ‘

Strict mode also means it makes sense to provide an is_instance method on models which effectively run validation then throws away the result while avoiding the (admittedly small) overhead of creating and raising an error or returning the validation result.

To be clear, this isnโ€™t a real isinstance call, rather it is equivalent to

is_instance
class BaseModel:
    ...
    @classmethod
    def model_is_instance(cls, data: Any) -> bool:
        try:
            cls(**data)
        except ValidationError:
            return False
        else:
            return True

Iโ€™m dropping the word โ€œparseโ€ and just using โ€œvalidateโ€ ๐Ÿ˜

Partly due to the issues with the lack of strict mode, Iโ€™ve gone back and forth between using the terms โ€œparseโ€ and โ€œvalidateโ€ for what pydantic does.

While pydantic is not simply a validation library (and Iโ€™m sure some would argue validation is not strictly what it does), most people use the word โ€œvalidationโ€.

Itโ€™s time to stop fighting that, and use consistent names.

The word โ€œparseโ€ will no longer be used except when talking about JSON parsing, see model methods above.

Changes to custom field types ๐Ÿ˜

Since the core structure of validators has changed from โ€œa list of validators to call one after anotherโ€ to โ€œa tree of validators which call each otherโ€, the __get_validators__ way of defining custom field types no longer makes sense.

Instead, weโ€™ll look for the attribute __pydantic_validation_schema__ which must be a pydantic-core compliant schema for validating data to this field type (the function item can be a string, if so a function of that name will be taken from the class, see 'validate' below).

Hereโ€™s an example of how a custom field type could be defined:

New custom field types
from pydantic import ValidationSchema

class Foobar:
    def __init__(self, value: str):
        self.value = value

    __pydantic_validation_schema__: ValidationSchema = {
        'type': 'function',
        'mode': 'after',
        'function': 'validate',
        'schema': {'type': 'str'}
    }

    @classmethod
    def validate(cls, value):
        if 'foobar' in value:
            return Foobar(value)
        else:
            raise ValueError('expected foobar')

Whatโ€™s going on here: __pydantic_validation_schema__ defines a schema which effectively says:

Validate input data as a string, then call the validate function with that string, use the returned value as the final result of validation.

ValidationSchema is just an alias to pydantic_core.Schema which is a type defining the schema for validation schemas.

We can probably provide one or more helper functions to make __pydantic_validation_schema__ easier to generate.

Other Improvements ๐Ÿ‘

Some other things which will also change, IMHO for the better:

  1. Recursive models with cyclic references - although recursive models were supported by pydantic V1, data with cyclic references caused recursion errors, in pydantic-core cyclic references are correctly detected and a validation error is raised
  2. The reason Iโ€™ve been so keen to get pydantic-core to compile and run with wasm is that I want all examples in the docs of pydantic V2 to be editable and runnable in the browser
  3. Full support for TypedDict, including total=False - e.g. omitted keys, providing validation schema to a TypedDict field/item will use Annotated, e.g. Annotated[str, Field(strict=True)]
  4. from_orm has become from_attributes and is now defined at schema generation time (either via model config or field config)
  5. input_value has been added to each line error in a ValidationError, making errors easier to understand, and more comprehensive details of errors to be provided to end users, pydantic#784
  6. on_error logic in a schema which allows either a default value to be used in the event of an error, or that value to be omitted (in the case of a total=False TypedDict), pydantic-core#151
  7. datetime, date, time & timedelta validation is improved, see the speedate Rust library I built specifically for this purpose for more details
  8. Powerful โ€œpriorityโ€ system for optionally merging or overriding config in sub-models for nested schemas
  9. Pydantic will support annotated-types, so you can do stuff like Annotated[set[int], Len(0, 10)] or Name = Annotated[str, Len(1, 1024)]
  10. A single decorator for general usage - we should add a validate decorator which can be used:
  • on functions (replacing validate_arguments)
  • on dataclasses, pydantic.dataclasses.dataclass will become an alias of this
  • on TypedDicts
  • On any supported type, e.g. Union[...], Dict[str, Thing]
  • On Custom field types - e.g. anything with a __pydantic_schema__ attribute
  1. Easier validation error creation, Iโ€™ve often found myself wanting to raise ValidationErrors outside models, particularly in FastAPI (here is one method Iโ€™ve used), we should provide utilities to generate these errors
  2. Improve the performance of __eq__ on models
  3. Computed fields, these having been an idea for a long time in pydantic - we should get them right
  4. Model validation that avoids instances of subclasses leaking data (particularly important for FastAPI), see pydantic-core#155
  5. Weโ€™ll now follow semvar properly and avoid breaking changes between minor versions, as a result, major versions will become more common
  6. Improve generics to use M(Basemodel, Generic[T]) instead of M(GenericModel, Generic[T]) - e.g. GenericModel can be removed; this results from no-longer needing to compile pydantic code with cython

Removed Features & Limitations ๐Ÿ˜ฆ

The emoji here is just for variation, Iโ€™m not frowning about any of this, these changes are either good IMHO (will make pydantic cleaner, easier to learn and easier to maintain) or irrelevant to 99.9+% of users.

  1. __root__ custom root models are no longer necessary since validation on any supported data type is allowed without a model
  2. .parse_file() and .parse_raw(), partially replaced with .model_validate_json(), see model methods
  3. .schema_json() & .copy(), see model methods
  4. TypeError are no longer considered as validation errors, but rather as internal errors, this is to better catch errors in argument names in function validators.
  5. Subclasses of builtin types like str, bytes and int are coerced to their parent builtin type, this is a limitation of how pydantic-core converts these types to Rust types during validation, if you have a specific need to keep the type, you can use wrap validators or custom type validation as described above
  6. integers are represented in rust code as i64, meaning if you want to use ints where abs(v) > 2^63 โˆ’ 1 (9,223,372,036,854,775,807), youโ€™ll need to use a wrap validator and your own logic
  7. Settings Management ??? - I definitely donโ€™t want to remove the functionality, but itโ€™s something of a historical curiosity that it lives within pydantic, perhaps it should move to a separate package, perhaps installable alongside pydantic with pip install pydantic[settings]?
  8. The following Config properties will be removed:
    • fields - itโ€™s very old (it pre-dates Field), can be removed
    • allow_mutation will be removed, instead frozen will be used
    • error_msg_templates, itโ€™s not properly documented anyway, error messages can be customized with external logic if required
    • getter_dict - pydantic-core has hardcoded from_attributes logic
    • json_loads - again this is hard coded in pydantic-core
    • json_dumps - possibly
    • json_encoders - see the export โ€œmodeโ€ discussion above
    • underscore_attrs_are_private we should just choose a sensible default
    • smart_union - all unions are now โ€œsmartโ€
  9. dict(model) functionality should be removed, thereโ€™s a much clearer distinction now that in 2017 when I implemented this between a model and a dict

Features Remaining ๐Ÿ˜

The following features will remain (mostly) unchanged:

  • JSONSchema, internally this will need to change a lot, but hopefully the external interface will remain unchanged
  • dataclass support, again internals might change, but not the external interface
  • validate_arguments, might be renamed, but otherwise remain
  • hypothesis plugin, might be able to improve this as part of the general cleanup

Questions โ“

I hope the explanation above is useful. Iโ€™m sure people will have questions and feedback; Iโ€™m aware Iโ€™ve skipped over some features with limited detail (this post is already fairly long ๐Ÿ˜ด).

To allow feedback without being overwhelmed, Iโ€™ve created a โ€œPydantic V2โ€ category for discussions on github - please feel free to create a discussion if you have any questions or suggestions. We will endeavour to read and respond to everyone.


Implementation Details :nerd:

(This is yet to be built, so these are nascent ideas which might change)

At the center of pydantic v2 will be a PydanticValidator class which looks roughly like this (note: this is just pseudo-code, itโ€™s not even valid python and is only supposed to be used to demonstrate the idea):

PydanticValidator
# type identifying data which has been validated,
# as per pydantic-core, this can include "fields_set" data
ValidData = ...

# any type we can perform validation for
AnyOutputType = ...

class PydanticValidator:
    def __init__(self, output_type: AnyOutputType, config: Config):
        ...
    def validate(self, input_data: Any) -> ValidData:
        ...
    def validate_json(self, input_data: str | bytes | bytearray) -> ValidData:
        ...
    def is_instance(self, input_data: Any) -> bool:
        ...
    def is_instance_json(self, input_data: str | bytes | bytearray) -> bool:
        ...
    def json_schema(self) -> dict:
        ...
    def dump(
        self,
        data: ValidData,
        include: ... = None,
        exclude: ... = None,
        by_alias: bool = False,
        exclude_unset: bool = False,
        exclude_defaults: bool = False,
        exclude_none: bool = False,
        mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
        converter: Callable[[Any], Any] | None = None
    ) -> Any:
        ...
    def dump_json(self, ...) -> str:
        ...

This could be used directly, but more commonly will be used by the following:

  • BaseModel
  • the validate decorator described above
  • pydantic.dataclasses.dataclass (which might be an alias of validate)
  • generics

The aim will be to get pydantic V2 to a place were the vast majority of tests continue to pass unchanged.

Thereby guaranteeing (as much as possible) that the external interface to pydantic and its behaviour are unchanged.

Conversion Table

The table below provisionally defines what input value types are allowed to which field types.

An updated and complete version of this table will be included in the docs for V2.

Field TypeInputModeInput SourceConditions
strstrbothpython, JSON-
strbyteslaxpythonassumes UTF-8, error on unicode decoding error
strbytearraylaxpythonassumes UTF-8, error on unicode decoding error
bytesbytesbothpython-
bytesstrbothJSON-
bytesstrlaxpython-
bytesbytearraylaxpython-
intintstrictpython, JSONmax abs value 2^64 - i64 is used internally, bool explicitly forbidden
intintlaxpython, JSONi64
intfloatlaxpython, JSONi64, must be exact int, e.g. f % 1 == 0, nan, inf raise errors
intDecimallaxpython, JSONi64, must be exact int, e.g. f % 1 == 0
intboollaxpython, JSON-
intstrlaxpython, JSONi64, must be numeric only, e.g. [0-9]+
floatfloatstrictpython, JSONbool explicitly forbidden
floatfloatlaxpython, JSON-
floatintlaxpython, JSON-
floatstrlaxpython, JSONmust match [0-9]+(\.[0-9]+)?
floatDecimallaxpython-
floatboollaxpython, JSON-
boolboolbothpython, JSON-
boolintlaxpython, JSONallowed: 0, 1
boolfloatlaxpython, JSONallowed: 0, 1
boolDecimallaxpython, JSONallowed: 0, 1
boolstrlaxpython, JSONallowed: 'f', 'n', 'no', 'off', 'false', 't', 'y', 'on', 'yes', 'true'
NoneNonebothpython, JSON-
datedatebothpython-
datedatetimelaxpythonmust be exact date, eg. no H, M, S, f
datestrbothJSONformat YYYY-MM-DD
datestrlaxpythonformat YYYY-MM-DD
datebyteslaxpythonformat YYYY-MM-DD (UTF-8)
dateintlaxpython, JSONinterpreted as seconds or ms from epoch, see speedate, must be exact date
datefloatlaxpython, JSONinterpreted as seconds or ms from epoch, see speedate, must be exact date
datetimedatetimebothpython-
datetimedatelaxpython-
datetimestrbothJSONformat YYYY-MM-DDTHH:MM:SS.f etc. see speedate
datetimestrlaxpythonformat YYYY-MM-DDTHH:MM:SS.f etc. see speedate
datetimebyteslaxpythonformat YYYY-MM-DDTHH:MM:SS.f etc. see speedate, (UTF-8)
datetimeintlaxpython, JSONinterpreted as seconds or ms from epoch, see speedate
datetimefloatlaxpython, JSONinterpreted as seconds or ms from epoch, see speedate
timetimebothpython-
timestrbothJSONformat HH:MM:SS.FFFFFF etc. see speedate
timestrlaxpythonformat HH:MM:SS.FFFFFF etc. see speedate
timebyteslaxpythonformat HH:MM:SS.FFFFFF etc. see speedate, (UTF-8)
timeintlaxpython, JSONinterpreted as seconds, range 0 - 86399
timefloatlaxpython, JSONinterpreted as seconds, range 0 - 86399.9*
timeDecimallaxpython, JSONinterpreted as seconds, range 0 - 86399.9*
timedeltatimedeltabothpython-
timedeltastrbothJSONformat ISO8601 etc. see speedate
timedeltastrlaxpythonformat ISO8601 etc. see speedate
timedeltabyteslaxpythonformat ISO8601 etc. see speedate, (UTF-8)
timedeltaintlaxpython, JSONinterpreted as seconds
timedeltafloatlaxpython, JSONinterpreted as seconds
timedeltaDecimallaxpython, JSONinterpreted as seconds
dictdictbothpython-
dictObjectbothJSON-
dictmappinglaxpythonmust implement the mapping interface and have an items() method
TypedDictdictbothpython-
TypedDictObjectbothJSON-
TypedDictAnybothpythonbuiltins not allowed, uses getattr, requires from_attributes=True
TypedDictmappinglaxpythonmust implement the mapping interface and have an items() method
listlistbothpython-
listArraybothJSON-
listtuplelaxpython-
listsetlaxpython-
listfrozensetlaxpython-
listdict_keyslaxpython-
tupletuplebothpython-
tupleArraybothJSON-
tuplelistlaxpython-
tuplesetlaxpython-
tuplefrozensetlaxpython-
tupledict_keyslaxpython-
setsetbothpython-
setArraybothJSON-
setlistlaxpython-
settuplelaxpython-
setfrozensetlaxpython-
setdict_keyslaxpython-
frozensetfrozensetbothpython-
frozensetArraybothJSON-
frozensetlistlaxpython-
frozensettuplelaxpython-
frozensetsetlaxpython-
frozensetdict_keyslaxpython-
is_instanceAnybothpythonisinstance() check returns True
is_instance-bothJSONnever valid
callableAnybothpythoncallable() check returns True
callable-bothJSONnever valid

The ModelClass validator (use to create instances of a class) uses the TypedDict validator, then creates an instance with __dict__ and __fields_set__ set, so same rules apply as TypedDict.