Pydantic v2.7 is now available! This release is our biggest since v2.0, with a focus on performance improvements and highly requested new features. This release also featured the work of over 30 new contributors! In this post, we'll cover the highlights of the release.
You can see the full changelog here.
New Features
Partial JSON parsing
Pydantic's JSON parser offers support for partial JSON parsing. This capability allows the parser to read input until it encounters invalid syntax, making a best-effort attempt to return a JSON object that accurately represents the valid portion of the input. Exposed via the from_json
method, this feature is especially valuable for processing streaming outputs from Large Language Models (LLMs), which often generate partial JSON objects that traditional parsers cannot handle without errors.
One of the reasons that this feature is so helpful is that it's beneficial for validating LLM outputs.
In particular, LLMs often return a partial JSON object that's not syntactically correct JSON.
In the past, it wasn't possible to parse said response without a JSON parsing error.
Now, you can enable partial JSON parsing to parse the response, and then subsequently validate the parsed object against a Pydantic model
with model_validate
.
Here's a simple example:
from pydantic_core import from_json
partial_json_data = '["aa", "bb", "c' # (1)!
try:
result = from_json(partial_json_data, allow_partial=False)
except ValueError as e:
print(e) # (2)!
#> EOF while parsing a string at line 1 column 15
result = from_json(partial_json_data, allow_partial=True)
print(result) # (3)!
#> ['aa', 'bb']
- The JSON list is incomplete - it's missing a closing
"]
- When
allow_partial
is set toFalse
(the default), a parsing error occurs. - When
allow_partial
is set toTrue
, part of the input is deserialized successfully.
You can learn more about integrating Pydantic with your LLM work from some of our blog posts.
For more information, check out the docs for this new feature!
Generic Secret
base type
Pydantic offers support for SecretStr
and SecretBytes
types, which are used to represent sensitive data.
We've extended this support to include a generic Secret
base type, which can be used to create custom secret types.
For example, you could create a SecretSalary
type that wraps an integer salary value and
and customizes the display of the secret value like so:
from datetime import date
from pydantic import BaseModel, Secret
class SecretSalary(Secret[int]):
def _display(self) -> str:
return '$******'
class Employee(BaseModel):
name: str
salary: SecretSalary
employee = Employee(name='John Doe', salary=100_000)
print(repr(employee))
#> Employee(name='John Doe', salary=SecretSalary('$******'))
print(employee.salary)
#> $******
print(employee.salary.get_secret_value())
#> 100000
If you're satisfied with a more generalized repr
output, you can use this even more concise version,
where the Secret
type is directly parametrized with no need for the subclass:
from typing_extensions import Annotated
from pydantic import Secret, TypeAdapter
ta = TypeAdapter(Secret[int])
my_secret_int = ta.validate_python(123)
print(my_secret_int)
#> **********
print(my_secret_int.get_secret_value())
#> 123
This feature is incredibly extensible and can be used to create custom secret types for a wide variety of base types.
Explore the usage docs to learn more!
deprecated
fields
One of the most highly requested features in Pydantic (ever) is the ability to mark fields as deprecated. Thanks to the hard work of @Viicos, this feature has been realized!
Marking a field as deprecated
will result in:
- A runtime deprecation warning emitted when accessing the field
- The
deprecated
parameter being set totrue
in the generated JSON schema
The deprecated
field can be set to any of:
- A string, which will be used as the deprecation message.
- An instance of the
warnings.deprecated
decorator (or thetyping_extensions
backport). - A boolean, which will be used to mark the field as deprecated with a default 'deprecated' deprecation message.
Here's a simple example:
from pydantic import BaseModel, Field
class Model(BaseModel):
deprecated_field: int = Field(deprecated=True)
print(Model.model_json_schema()['properties']['deprecated_field'])
#> {'deprecated': True, 'title': 'Deprecated Field', 'type': 'integer'}
The docs for this feature delve into more details about the various ways to mark and customize deprecated fields.
serialize_as_any
runtime setting
In v1, Pydantic used serialization with duck-typing by default. In an attempt to improve security, Pydantic v2 switched away from this approach.
In Pydantic v2.7, we've reintroduced serialization with duck typing as an opt-in feature via a new serialize_as_any
runtime flag.
This opt in feature was available in previous v2.X versions via the SerializeAsAny
annotation, but that required
annotating each field individually. The new serialize_as_any
flag allows you to enable duck-typing serialization
for all fields in a model with a single flag.
Here's an example showcasing the basic usage of the setting:
from pydantic import BaseModel, TypeAdapter
class User(BaseModel):
name: str
class UserLogin(User):
password: str
ta = TypeAdapter(User)
user_login = UserLogin(name='John Doe', password='some secret')
print(ta.dump_python(user_login, serialize_as_any=False)) # (1)!
#> {'name': 'John Doe'}
print(ta.dump_python(user_login, serialize_as_any=True)) # (2)!
#> {'name': 'John Doe', 'password': 'some secret'}
- This is the default behavior - fields not present in the schema are not serialized.
- With
serialize_as_any
set toTrue
, fields not present in the schema are serialized.
We've upgraded the documentation for serialization with duck typing.
This section, in particular,
covers the new serialize_as_any
runtime flag.
Pass context to serialization
Pydantic previously supported context
in validation, but not in serialization. With the help of
@ornariece, we've added support for using a context
object during serialization.
Here's a simple example, where we use a unit
provided in the context
to convert a distance
field:
from pydantic import BaseModel, SerializationInfo, field_serializer
class Measurement(BaseModel):
distance: float # in meters
@field_serializer('distance')
def convert_units(self, v: float, info: SerializationInfo):
context = info.context
if context and 'unit' in context:
if context['unit'] == 'km':
v /= 1000 # convert to kilometers
elif context['unit'] == 'cm':
v *= 100 # convert to centimeters
return v
measurement = Measurement(distance=500)
print(measurement.model_dump()) # no context
#> {'distance': 500.0}
print(measurement.model_dump(context={'unit': 'km'})) # with context
#> {'distance': 0.5}
print(measurement.model_dump(context={'unit': 'cm'})) # with context
#> {'distance': 50000.0}
This feature is powerful as it further extends Pydantic's flexibility and customization capabilities when it comes to serialization.
See the documentation for more information.
Performance Improvements
PyO3 0.21
Pydantic uses PyO3 to connect our core Rust code to Python. This under the hood upgrade brings a significant performance improvement to Pydantic, as seen in these benchmarks.
For detailed information on the improvements and changes in PyO3 0.21, check out this blog post from David Hewitt, a Rust 🤝 Python expert!
SIMD integer and string JSON parsing on aarch64
Pydantic now uses SIMD instructions for integer and string JSON parsing on aarch64
(ARM) platforms.
Faster enum
validation and serialization
enum
validation and serialization logic was moved to pydantic-core
, which is written in Rust.
This migration results in a ~4x speedup for enum
validation and serialization.
Fast path for ASCII python string creation in JSON
jiter
, Pydantic's JSON parser, now has a fast path for creating ASCII Python strings. This change results in a ~15% performance improvement for Python string parsing.
Caching Python strings
Pydantic's JSON parser offers support for configuring how Python strings are cached during JSON parsing and validation. Memory usage increases slightly when caching strings, but it can improve performance significantly, especially in cases where certain strings are repeated frequently.
The cache_strings
setting (in model config
or as an argument to from_json
)
can take any of the following values:
True
or'all'
(the default): cache all strings'keys'
: cache only dictionary keysFalse
or'none'
: no caching
Learn more about this feature here.
Conclusion
With these new features and performance improvements, Pydantic v2.7 is the fastest and most feature-rich version of Pydantic yet. If you have any questions or feedback, please open a Github discussion. If you encounter any bugs, please open a Github issue.