2:I[7451,["7080","static/chunks/7080-108c8d457713fb20.js","2404","static/chunks/2404-d89bac4957e71bf2.js","5271","static/chunks/5271-34c50ebafed979cc.js","6222","static/chunks/app/articles/page-221995dad32f5cb2.js"],"FilterArticles"]
12:I[7080,["7080","static/chunks/7080-108c8d457713fb20.js","2404","static/chunks/2404-d89bac4957e71bf2.js","5271","static/chunks/5271-34c50ebafed979cc.js","6222","static/chunks/app/articles/page-221995dad32f5cb2.js"],""]
13:I[2404,["7080","static/chunks/7080-108c8d457713fb20.js","2404","static/chunks/2404-d89bac4957e71bf2.js","5271","static/chunks/5271-34c50ebafed979cc.js","6222","static/chunks/app/articles/page-221995dad32f5cb2.js"],"Image"]
14:I[4641,[],""]
15:I[2063,[],""]
16:I[6613,["7080","static/chunks/7080-108c8d457713fb20.js","3428","static/chunks/3428-25b0b747b1adfad3.js","4161","static/chunks/4161-6f08a13aa9ad2f94.js","4185","static/chunks/4185-c0a595eef14561b8.js","3185","static/chunks/app/layout-11811f9c8933dc7a.js"],"default"]
17:"$Sreact.suspense"
18:I[6599,["7080","static/chunks/7080-108c8d457713fb20.js","9160","static/chunks/app/not-found-1269e4687839457d.js"],"PreloadCss"]
1a:I[4576,["7080","static/chunks/7080-108c8d457713fb20.js","3428","static/chunks/3428-25b0b747b1adfad3.js","4161","static/chunks/4161-6f08a13aa9ad2f94.js","4185","static/chunks/4185-c0a595eef14561b8.js","3185","static/chunks/app/layout-11811f9c8933dc7a.js"],"default"]
3:T2028,
At [Hyperlint](https://hyperlint.com/), we've made it [our mission](https://hyperlint.com/blog/the-hyperlint-mission) to help developer teams take the toil out of developer content.
We do that by providing an [AI-powered GitHub bot](https://github.com/apps/hyperlint-ai) and related utilities that review technical documentation for accuracy and SEO optimization. We do everything from catching typos to automatically updating documentation based on upstream API or CLI changes.
We deeply integrate with our user's workflows and want to provide a seamless experience. It's imperative for us to know exactly what's going on behind the scenes to deliver an exceptional experience.
In this post, we'll dive into why we chose [Pydantic Logfire](https://pydantic.dev/logfire) as our observability provider.
## Our need for powerful observability
[Hyperlint's AI documentation review](https://docs.hyperlint.com/ai-reviewer) runs checks on thousands of pull requests to ensure our users' documentation is top-notch and up-to-date.
With so many developers and technical writers relying on us to review every PR, every time, we needed a monitoring solution that could keep up with our demanding workload and alert us to any issues quickly.
## Enter Pydantic Logfire: simple, straightforward, and performant
After evaluating various options, we chose Pydantic Logfire for its simplicity, performance, and powerful features. Here's why Logfire stood out, in a nutshell:
1. [Effortless integration](#effortless-integration)
2. [Debugging made easy](#debugging-made-easy)
3. [Great support for AI](#great-support-for-ai)
4. [Astonishingly responsive team](#astonishingly-responsive-team)
Let's review each of these in detail.
### Effortless integration
Pydantic Logfire seamlessly integrates with our existing logging tools. We run a lot of Python and we already heavily leverage [Pydantic](https://docs.pydantic.dev/latest/).
We use [Loguru](https://loguru.readthedocs.io/en/stable/index.html) as our logging tool of choice and [Logfire's out of the box integration](https://logfire.pydantic.dev/docs/integrations/loguru/) made flipping it on across our app trivial. This meant we saw logs in logfire in seconds since we started with the integration.
We also leverage other integrations with our [database (like Postgres, MongoDB etc.)](https://logfire.pydantic.dev/docs/integrations/), [web servers (like Flask and FastAPI)](https://logfire.pydantic.dev/docs/integrations/), and AI partner like [OpenAI and Anthropic](https://logfire.pydantic.dev/docs/integrations/).
The ability to integrate with our existing logging tool and log structured objects with Pydantic gave us so much value, right off the bat.
We now [log spans using the `instrument` function](https://logfire.pydantic.dev/docs/guides/onboarding-checklist/add-manual-tracing/#convenient-function-spans-with-logfireinstrument) along with other associated metadata, to ensure we have the right context in our observability pipeline.
```python
@logfire.instrument("Execute review")
async def review(arg1:str, arg2:int):
...
```
We were up in running in no time and getting value the same day.
### Debugging made easy
[Logfire's intuitive interface](https://logfire.pydantic.dev/docs/guides/web-ui/live/) and comprehensive logging capabilities have significantly simplified our debugging process.
When events occur in the system, we use the Logfire UI to quickly determine issues or performance bottlenecks. We can understand performance at various levels of our system.
![Logfire Interface](/assets/blog/hyperlint/duck-db.png)
Using Logfire helped us drop our time to review by 80%+ from initial versions because it helped us identify critical bottlenecks and performance issues with almost no effort.
On top of that, [Logfire's alerts](https://logfire.pydantic.dev/docs/guides/web-ui/alerts/) integrate with Slack, making it easy for us to see issues in real time.
**Exception tracing made magical**
In particular, [the exception tracing experience is magical in the UI](https://logfire.pydantic.dev/docs/guides/web-ui/live/).
Logfire highlights issues and exceptions allowing us to quickly trace problematic code paths as they come up. We can even trace across services.
For example, in the below trace, we encountered an error during search indexing and were able to quickly root cause the issue so that it *prevented* any customer issues.
![Search Index Error](/assets/blog/hyperlint/search-index.png)
This experiences cuts out debugging time and allows us to focus on our users.
### Great support for AI
Logfire's native integration with [OpenAI](https://logfire.pydantic.dev/docs/integrations/openai/) and [Anthropic](https://logfire.pydantic.dev/docs/integrations/anthropic/) means that understanding what's happening not just with respect to the metrics and logs of our application, but also the AI-related evaluations that we perform keeps our system simple and covers all of our needs.
### Astonishingly responsive team
We started using Logfire very early on in their preview. We've been astonished by the responsiveness of the team, always eager to help debug issues or provide recommendations. Culturally at Hyperlint, we're focused on our end users and making them successful.
We love to see that in our partners and Logfire demonstrated that time and time again.
## Deep dive: the power of context for AI apps
While Logfire's AI support might seem like a small feature, for us it's a standout feature.
Pydantic Logfire helps us maintain context throughout our observability pipeline - AI or not. When dealing with complex documentation review processes and AI-driven analyses, context is essential and Logfire provides that all in a single platform.
[Pydantic Logfire](https://pydantic.dev/logfire) allows us to:
- [Trace the entire lifecycle of a documentation review or change](https://logfire.pydantic.dev/docs/guides/web-ui/live/), from initial PR creation, any AI that we perform, and ultimately to final approval.
- Identify patterns and trends across multiple PRs and repositories.
- [Get alerts about problematic areas or inconsistencies](https://logfire.pydantic.dev/docs/guides/web-ui/alerts/).
This holistic view has been invaluable in improving our service and providing more accurate, helpful reviews to our users.
Using Logfire has re-affirmed our beliefs in how to build impactful AI apps and systems:
- **Good AI monitoring is just good monitoring** - since we are a small team, we are focused on high leverage tools that can do multiple things. Logfire allows us to focus on our users, not on our infrastructure.
- **[Compound AI Systems](https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/) are the way forward** - AI is just a small part of what we do to help our users. We have manual checks, third party tools, and various AI providers. We believe that systems that compose different tools together are key to building great AI apps and Logfire helps us do.
- **Seamless monitoring w/ easy data access** - We can visual OpenAI chats in the UI, but we can also export them using their Query Endpoint to analyze logs alongside production data.
It's critical when using AI to look at your data, and Logfire couldn't make that easier.
## Logfire is a game-changer for Hyperlint
Choosing Logfire as our observability provider has been a game-changer for Hyperlint. Its simplicity, performance, and powerful features have allowed us to:
- Maintain high reliability and performance for [Hyperlint's AI documentation reviewer](https://docs.hyperlint.com/ai-reviewer).
- Quickly identify and resolve issues, minimizing downtime.
- Develop custom evaluation tools that integrate seamlessly with our existing workflows.
- Gain deeper insights into the [AI-driven documentation processes](https://hyperlint.com/product#styleguide) for our customers.
Pydantic Logfire has proven to be that solution for us at Hyperlint, enabling us to focus on what we do best – [helping teams create and maintain high-quality technical documentation](https://hyperlint.com/). We've been users from the beginning are happy paying customers.
Give Logfire a try today - we love it and we're sure you will too!4:T15fa,
At Pydantic, open source isn't just a part of our story — it is our story.
As well as developing and maintaining open source libraries, we're using and contributing to many open source projects as we build [Pydantic Logfire](/logfire), our observability platform and primary commercial product.
Today, we're happy to announce our commitment to the [Open Source Pledge](https://opensourcepledge.com/) where we commit to investing at least $2,000 per developer per year in open source projects and maintainers.
Spending money to maintain your supply chain shouldn't require an initiative and a glossy website, but for too long the open source ecosystem has been taken for granted. We're proud to be part of the movement to change that.
## Our Open Source DNA
Pydantic started as a side project. Today, it's downloaded more than TypeScript and used by millions of developers in companies of all sizes. It has changed my life by allowing me to start this company and build the observability tool I've dreamed of for years.
## Our commitment
We want maintaining Open Source to make financial sense as well as being intellectually rewarding for more developers.
At the risk of sounding grandiose, my ultimate hope is that investing in Open Source today has the stratospheric return on investment that investing in universities did 800 years ago.
Through the **Pydantic Open Source Fund** Pydantic is committing to spending at least $2,000 per developer per year to support open source projects and maintainers.
With 11 devs as of October 2024, that's currently $22,000 per year.
That's in addition to the roughly $250,000 we've spent in the last year developing, documenting and maintaining Open Source libraries like [Pydantic](https://github.com/pydantic/pydantic), [pydantic-core](https://github.com/pydantic/pydantic-core), [Jiter](https://github.com/pydantic/jiter) and [FastUI](https://github.com/pydantic/fastui) — e.g. here I'm not including "Open Source for commercial purposes" like the [Logfire Python SDK](https://github.com/pydantic/logfire) or the [logfire demo](https://github.com/pydantic/logfire-demo).
## How we choose projects to support
The Pydantic team maintains numerous major open source projects including
[PyO3](https://docs.rs/pyo3/latest/pyo3/) (David Hewitt),
[uvicorn](https://www.uvicorn.org/) (Marcelo Trylesinski),
[starlette](https://www.starlette.io/) (Adrian Garcia Badaracco & Marcelo Trylesinski),
[virtuoso](https://virtuoso.dev/) (Petyo Ivanov)
but to avoid conflicts of interest we won't include support for these OSS developers in our fund.
We'll choose projects to support that we use in the following order of priority:
1. Projects we rely on for our commercial products
2. Projects we rely on for our open source libraries
(Of course projects we rely on for both will be highest priority)
After that we'll select projects based on the following priorities:
1. How critical the project is to what we're doing
2. How in-need the project is of financial support
3. How impactful our funding will be in improving the project in ways we care about
4. How aligned the project is with our values
## Projects we're supporting
Projects and maintainers we're sponsoring this year:
- [arrow-rs](https://github.com/apache/arrow-rs) Official Rust implementation of Apache Arrow — $12,000
- [encode](https://github.com/encode) Httpx, Starlette, Uvicorn — $2,400
- [messense](https://github.com/messense) PyO3, maturin, maturin-action — $1,200
- [squidfunk](https://github.com/squidfunk) Material for MkDocs — $1,200
- [pawamoy](https://github.com/pawamoy) mkdocstrings, griffe — $1,200
- [15r10nk](https://github.com/15r10nk) pytest inline-snapshots, executing — $1,200
- [pytest](https://github.com/pytest-dev) pytest, pytest-asyncio — $1,200
- [dvarrazzo](https://github.com/dvarrazzo) psycopg — $1,200
- [tokio](https://github.com/tokio-rs) tokio — $1,200
- [dtolnay](https://github.com/dtolnay) anyhow, rust-toolchain — $1,200
Total: $24,000 – $2,181 per developer
## Join us in supporting Open Source
To the maintainers and contributors who keep the open source ecosystem thriving: THANK YOU. Your work makes ours possible, and we're honored to support you through this fund.
Mechanical and structural engineering companies have always spent a large proportion of their revenue on their supply chain (Claude suggests 75% to 85% of the cost of goods sold goes on suppliers, I couldn't immediately find a better source).
The extraordinary adoption of software over the last few decades has led software engineering companies to assume they can get away without investing in their supply chain.
But I think that's wrong, I think software has eaten the world in spite (not because) of the lack of investment in the open source supply chain.
High profile security incidents are just the most visible evidence of the problem. The bigger and more insidious side effect of the lack of investment is all the projects that have died or never even been started because those who benefit from open source are so reluctant to pay for it in a meaningful way.
You or your company should stop being such tight-fisted bastards and pay the people who have helped you get rich, it might make you even richer, it might also make the world a better place.
---
**P.S.:** We're hiring:
* [Platform (DevOps) engineer](https://pydantic.dev/jobs/platform)
* [Rust / Database developer](https://pydantic.dev/jobs/rust) to work on our database, based on Apache DataFusion
If you think what we're working on sounds interesting, please get [in touch](/contact).
5:T1bc5,
Many of you reading our recent [Logfire launch and Series A Announcement](/articles/logfire-announcement) may be wondering:
> Wait, aren't you the team behind Pydantic, the data validation library? Why are you venturing into observability?
Fair question. Let me try to explain.
## Frustrations with existing tools
I've been frustrated by existing logging and monitoring tools for years. Most of these tools are built to serve the needs of large enterprises, and the resulting complexity often outweighs the insights they provide.
In many ways, observability feels like it's stuck where the rest of infra was 15 years ago. The waves of innovation that have radically simplified the process of hosting a web application have largely passed observability by.
The recent surge of "Observability for AI" tools aren't much better — yes, observing LLM calls is important, even disproportionately so, but those LLM calls are ultimately just one part of your application. Why introduce a completely new tool for that, when we could have a single platform that effectively handles both AI-specific monitoring and traditional observability?
## Developer first Observability
What we need is a general purpose observability platform with first class support for AI — but most importantly, one that developers actually want to use. Developers are the ones interacting with observability tools the most, yet many platforms seem to forget this.
That's where our background building Pydantic comes into play. Pydantic didn't succeed because it was the first, or the fastest. It became ubiquitous because developers loved using it. We've carried that same focus on developer experience into Logfire, which, in the observability landscape, apparently makes us unusual.
To back this point up, it's tempting to list all of Logfire's features — but that [already exists](https://logfire.pydantic.dev/docs/why-logfire/). Instead, I want to dive a little deeper into a few key choices we've made, as I think they are representative of the difference between Logfire and other observability tools.
## The Logfire SDK
Maintaining good SDKs is a significant investment of both time and resources. Most observability startups have shifted to relying on [OpenTelemetry](https://opentelemetry.io/docs/what-is-opentelemetry/) (OTel), which supports multiple languages at a lower cost by avoiding the need to develop and maintain custom SDKs. While this makes business sense, the victim is the developer stuck struggling with low-level, verbose APIs that are frequently unpleasant to work with.
Because of this, for Logfire, relying solely on OTel's Python libraries was never an option.
Instead, we built a beautiful SDK that wraps OTel but provides a much nicer API, and includes features the bare OTel libraries will never offer.
```python
import logfire
# this is generally all you need to set up logfire
logfire.configure()
# send a zero-duration span AKA a log
logfire.info("hello {name}", name="world")
# send a span with a duration
with logfire.span("my span"):
do_some_work()
# instrument a FastAPI app
app = FastAPI()
logfire.instrument_fastapi(app)
```
To contrast that with raw OTel, [here's](https://gist.github.com/samuelcolvin/73d6536166236cad2bf04044fd0ee0f1) a working example
of the same code using the Logfire SDK and the OTel SDK directly (including 36 lines of OTel boilerplate!).
Learn more about our SDK [in the docs](https://logfire.pydantic.dev/docs/).
So far we only have a Logfire-specific SDK for Python, although you can send data to Logfire from any [language with an OTel SDK](https://opentelemetry.io/docs/languages/) today. But we plan to build Logfire SDKs for other languages soon, likely starting with our preferred stack of TypeScript, Python, and Rust.
## SQL
The Logfire platform lets you write arbitrary SQL to query your data; you can use it to find attributes for a specific span, define alert conditions, or build complex aggregations for dashboards.
```SQL
SELECT attributes->'result'->>'name' AS name,
EXTRACT(YEAR FROM (attributes->'result'->>'dob')::date) AS "birth year"
FROM records
WHERE attributes->'result'->>'country_code' = 'USA';
```
Allowing direct SQL access imposes real technical constraints on the databases we can use, and comes with big engineering challenges, which is why no other observability company supports it. But for developers, this flexibility is invaluable — and we think the trade-off is well worth it.
Again, like maintaining an SDK, this is a decision that would only be made in a company composed of people who write code most days.
## Traces as Logs
One of the most innovative parts of Logfire is our live view:
![Logfire Platform — Live View](/assets/blog/logfire-ga/traces-as-logs.png)
(Logfire Platform — Live View)
The data comes from OTel traces, but is displayed like logs, only better.
The problem with "standard" OTel data for this view is that spans aren't sent until they are finished, which means you can't see activity as it happens, and you can't properly contextualize child spans when you do receive them because you won't have their parent. By maintaining our own SDK, we've been able to enhance how OpenTelemetry works, so we can send data about spans when they begin — what we call a "pending span." This required substantial effort, but it results in a vastly improved developer experience for interactive workflows. Now, the live view truly feels live.
## How we think about open source vs. commercial
Too many observability companies are abusing the open source label with their products. These products can be deliberately difficult to self-host to encourage use of the hosted alternative. In addition, the "open-source" versions are often missing critical functionality, forcing users onto closed source paid plans once they're locked in.
We're different: we have real, truly open source, open source, with massive adoption — Pydantic.
With Logfire, we're transparent: the SDK is open source (MIT licensed), but the platform itself is closed source. While we offer a generous free tier, our goal is for you to find enough value in Logfire to eventually pay for it. It's not always the simplest business decision, but we believe this transparency is the right approach.
## Try logfire today if you haven't already
Logfire is still evolving, and it's far from perfect. But I believe it's fundamentally different from the tools that came before it, and it has the potential to change how developers understand their applications. And I believe it's already the best tool on the market for its job.
Please give it a try, and [tell us](https://logfire.pydantic.dev/docs/help/) what works, and what sucks.
---
**P.S.:** We're hiring:
* [Platform (DevOps) engineer](https://pydantic.dev/jobs/platform)
* [Rust / Database developer](https://pydantic.dev/jobs/rust) to work on our database, based on Apache DataFusion
If you think what we're working on sounds interesting, please get in touch.
6:T940,
Today, I am excited to announce two significant milestones for our company:
**First, [Logfire — our developer-centric observability platform](/logfire) — is leaving beta** and is ready for use by all developers and organizations. **Secondly, we have raised $12.5 million in Series A funding**, led by [Sequoia](https://www.sequoiacap.com/) with participation from other existing investors [Partech](https://partechpartners.com/) and [Irregular Expression](https://www.irregex.vc/).
**Update:** TechCrunch have covered the announcement here.
_I've written a longer post about why we decided to build Logfire [here](/articles/why-logfire), and why we think we are the right team to do so._
Over the course of our beta, Logfire has seen significant growth and adoption. This momentum has not only validated our vision for a developer-centric observability platform, but also given our investors the confidence to deepen their commitment.
With this additional funding, we are committed to accelerating the development of Logfire, introducing new features, and ensuring that our platform can handle loads of any scale. If you've been waiting to adopt Logfire, you can now start to integrate it into your workflows with confidence.
We offer a generous free tier and a simple, scalable [pay-as-you-go pricing model](/pricing). _If you've been using Logfire in beta, thank you. Beta users who missed our offer of credit, please feel free to [get in touch](/contact) and we'll extend the offer to you._
While we are excited to leave beta, Logfire is still in rapid development. In particular, we still consider our alerts functionality "in beta", and we expect significant improvements in the near future.
We wouldn't be where we are today without the support of our investors, and I'm thrilled to continue our partnership with them going forward.
[![Pydantic](/assets/blog/logfire-ga/pydantic-team.jpg)](/about)
**P.S.:** We're hiring:
* [Platform (DevOps) engineer](https://pydantic.dev/jobs/platform)
* [Rust / Database developer](https://pydantic.dev/jobs/rust) to work on our database, based on Apache DataFusion
If you think what we're working on sounds interesting, please get in touch.
7:T2483,
[Pydantic v2.9](https://github.com/pydantic/pydantic/releases/tag/v2.9.0) is now available!
You can install it now via [PyPI](https://pypi.org/project/pydantic/) or your favorite package manager:
```bash
pip install --upgrade pydantic
```
This release features the work of over 25 contributors! In this post, we'll cover the highlights of the release.
You can see the full changelog on [GitHub](https://github.com/pydantic/pydantic/compare/v2.8.2...v2.9.0/).
This release contains significant [performance improvements](#performance-improvements),
[union serialization improvements](#tagged-union-serialization), and a handful of [new features](#new-features).
## New Features
### `complex` number support
We've added support for stdlib `complex` numbers in Pydantic.
For validation, we support both `complex` instances and strings that
[can be parsed](https://docs.python.org/3/library/functions.html#complex) into `complex` numbers.
```py
from pydantic import TypeAdapter
ta = TypeAdapter(complex)
complex_number = ta.validate_python('1+2j')
assert complex_number == complex(1, 2)
assert ta.dump_json(complex_number) == b'"1+2j"'
```
Credit for this goes to [@changhc](https://github.com/changhc)! For implementation details, see [#9654](https://github.com/pydantic/pydantic/pull/9654).
### Explicit `ZoneInfo` support
Pydantic now supports the `ZoneInfo` type explicitly (in Python v3.9+).
Here's an example of validation and serialization with the new type:
```py
from pydantic import TypeAdapter
from zoneinfo import ZoneInfo
ta = TypeAdapter(ZoneInfo)
tz = ta.validate_python('America/Los_Angeles')
assert tz == ZoneInfo('America/Los_Angeles')
assert ta.dump_json(tz) == b'"America/Los_Angeles"'
```
Thanks for the contribution, [@Youssefares](https://github.com/Youssefares)! See [#9896](https://github.com/pydantic/pydantic/pull/9896)
for more details regarding the new implementation.
### `val_json_bytes` setting
The new [`val_json_bytes`](https://docs.pydantic.dev/2.9/api/config/#pydantic.config.ConfigDict.val_json_bytes) setting enables users to specify which encoding to use when decoding `bytes` data from JSON.
This setting, in combination with the existing [`ser_json_bytes`](https://docs.pydantic.dev/2.9/api/config/#pydantic.config.ConfigDict.ser_json_bytes), supports consistent JSON round-tripping for `bytes` data.
For example:
```py
from pydantic import TypeAdapter, ConfigDict
ta = TypeAdapter(bytes, config=ConfigDict(ser_json_bytes='base64', val_json_bytes='base64'))
some_bytes = b'hello'
validated_bytes = ta.validate_python(some_bytes)
encoded_bytes = b'"aGVsbG8="'
assert ta.dump_json(validated_bytes) == encoded_bytes
# verifying round trip
# before we added support for val_json_bytes, the default encoding was 'utf-8' for validation, so this would fail
assert ta.validate_json(encoded_bytes) == validated_bytes
```
Thanks for the addition, [@josh-newman](https://github.com/josh-newman)! You can see the full implementation details [here](https://github.com/pydantic/pydantic/pull/9770).
### Support for JSON schema with custom validators
Previously, when using custom validators like [`BeforeValidator`](https://docs.pydantic.dev/2.9/api/functional_validators/#pydantic.functional_validators.BeforeValidator)
or [`field_validator`](https://docs.pydantic.dev/2.9/api/functional_validators/#pydantic.functional_validators.field_validator), it wasn't possible to customize the `mode='validation'`
JSON schema associated with the field / type in question.
Now, you can use the `json_schema_input_type` specification to customize the JSON schema for fields with custom validators. For example:
```py
from typing import Any, Union
from pydantic_core import PydanticKnownError
from typing_extensions import Annotated
from pydantic import PlainValidator, TypeAdapter
def validate_maybe_int(v: Any) -> int:
if isinstance(v, int):
return v
elif isinstance(v, str):
try:
return int(v)
except ValueError:
...
raise PydanticKnownError('int_parsing')
ta = TypeAdapter(Annotated[int, PlainValidator(validate_maybe_int, json_schema_input_type=Union[int, str])])
print(ta.json_schema(mode='validation'))
# > {'anyOf': [{'type': 'integer'}, {'type': 'string'}]}
```
!!! note
You can't use this new feature with `mode='after'` validators, as customizing `mode='validation'` JSON schema doesn't make sense in this context.
For implementation details, see [#10094](https://github.com/pydantic/pydantic/pull/10094). You can find documentation for `json_schema_input_type`
in the API docs for all custom validators that support said specification.
## Performance Improvements
During our v2.9.0 development cycle, we placed a large emphasis on improving the performance of Pydantic.
Specifically, we've made significant improvements to the schema building process, which results in faster
import times and reduced memory allocation.
Consider this use case: you have a large number of Pydantic models in a file, say `models.py`. You
import a few of these models in another file, `main.py`. This is a relatively common pattern for Pydantic users.
For cases like the above, we've achieved up to a 10x improvement in import times, and a significant reduction in
temporary memory allocations, which can be a huge win for users with an abundance of models.
We'll discuss a few of the specific improvements that we've made to the schema building process:
1. Decrease pydantic import times by ~35%, see [#10009](https://github.com/pydantic/pydantic/pull/10009)
This covers cases like `import pydantic` and `from pydantic import BaseModel`
2. Speed up schema building by ~5% via optimizing imports in hot loops, see [#10013](https://github.com/pydantic/pydantic/pull/10013)
3. Speed up schema building (and memory allocations) by up to 10x by skipping namespace caches, see [#10113](https://github.com/pydantic/pydantic/pull/10113)
4. Reduce temporary memory allocations by avoiding namespace copy operations, see [#10267](https://github.com/pydantic/pydantic/pull/10267)
We have plans to continue with schema building performance improvements in v2.10 and beyond.
You can find lots of additional detail discussed in the above PRs.
## Notable Improvements / Fixes
### Tagged `Union` serialization
Pydantic is well known for its tagged union validation capabilities. In [pydantic/pydantic-core#1397](https://github.com/pydantic/pydantic-core/pull/1397),
we've added support for a tagged union serializer, which should make more intuitive serialization decisions when using tagged unions. We've
also made some tangential fixes such as improving serialization choices for `float | int`, or `Decimal | float` unions.
### Moving annotation compatibility errors to validation phase
In general, during schema generation, Pydantic is generous in applying validator / constraint logic to types.
This can backfire in some cases, when at runtime it becomes evident that a given validator / constraint isn't compatible with some input data.
In this release, we've designed some more intuitive error messages for these cases, and moved them to the validation (runtime) phase,
rather than failing in some valid cases at schema build time. For implementation details, see [#9999](https://github.com/pydantic/pydantic/pull/9999)
## Changes
### Breaking change: Merge `dict` type `json_schema_extra` values, instead of overwriting
This change shouldn't affect anything except specialized usage of `json_schema_extra. That being said,
if you'd like to replicate the old behavior, see [these docs](https://docs.pydantic.dev/dev/concepts/json_schema/#merging-json_schema_extra).
### Support sibling keys to `$ref` keys, thus removing `allOf` JSON schema workarounds
Any affected JSON syntax is now valid, and more simple! See [#10029](https://github.com/pydantic/pydantic/pull/10029) for details.
### Deprecate passing a `dict` to the `Examples` class
This is relatively self-explanatory. See [#10181](https://github.com/pydantic/pydantic/pull/10181) for more details.
This change encourages syntactically valid JSON schemas.
## Conclusion
We are excited to announce that Pydantic v2.9.0 is here, and it's the most feature-rich and fastest version of Pydantic yet.
If you have any questions or feedback, please open a [GitHub discussion](https://github.com/pydantic/pydantic/discussions/new/choose).
If you encounter any bugs, please open a [GitHub issue](https://github.com/pydantic/pydantic/issues/new/choose).
Thank you to all of our contributors for making this release possible!
We would especially like to acknowledge the following individuals for their significant contributions to this release:
- [@josh-newman](https://github.com/josh-newman)
- [@changhc](https://github.com/changhc)
- [@Youssefares](https://github.com/Youssefares)
- [@dpeachey](https://github.com/dpeachey)
## Pydantic Logfire
If you're enjoying Pydantic, you might **really** like [Pydantic Logfire](https://pydantic.dev/logfire), a new observability tool
built by the team behind Pydantic. You can now [try Logfire](https://logfire.pydantic.dev/login/) for free.
We'd love it if you'd join the [Pydantic Logfire Slack](https://logfire.pydantic.dev/docs/help/#:~:text=Pydantic%20Logfire%20Slack) and
let us know what you think!8:T3bcf,
[Pydantic v2.8](https://github.com/pydantic/pydantic/releases/tag/v2.8.0) is now available!
You can install it now via [PyPi](https://pypi.org/project/pydantic/) or your favorite package manager:
```bash
pip install --upgrade pydantic
```
This release features the work of over 50 contributors! In this post, we'll cover the highlights of the release.
You can see the full changelog on [github](https://github.com/pydantic/pydantic/compare/v2.7.4...v2.8.0/).
This release focused especially on typing related bug fixes and improvements, as well as some opt-in performance enhancing features.
## New Features
### Fail-Fast Validation
Pydantic v2.8 introduces a new feature called fail fast validation. This is currently available for a limited number of sequence types
including `list`, `tuple`, `set`, and `frozenset`. When you use `FailFast` validation, Pydantic will stop validation as soon as it encounters an error.
This feature is useful when you care more about the validity of your data than the thoroughness of the validation errors.
For many folks, this tradeoff in error specificity is well worth the performance gain.
You can either use `FailFast()` as a type annotation or specify the `fail_fast` parameter in the `Field` constructor.
For example:
```py
from typing import List
from typing_extensions import Annotated
from pydantic import BaseModel, FailFast, Field, ValidationError
class Model(BaseModel):
x: Annotated[List[int], FailFast()]
y: List[int] = Field(..., fail_fast=True)
# This will raise a single error for the first invalid value in each list
# At which point, validation for said field will stop
try:
obj = Model(x=[1, 2, 'a', 4, 5, 'b', 7, 8, 9, 'c'], y=[1, 2, 'a', 4, 5, 'b', 7, 8, 9, 'c'])
except ValidationError as e:
print(e)
"""
2 validation errors for Model
x.2
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
For further information visit https://errors.pydantic.dev/2.8/v/int_parsing
y.2
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='a', input_type=str]
For further information visit https://errors.pydantic.dev/2.8/v/int_parsing
"""
```
We plan on extending this feature to other types in the future!
You can read more about `FailFast` in the [API reference](https://docs.pydantic.dev/2.8/api/types/#pydantic.types.FailFast)
### Model/field deprecation in JSON schema
In v2.7, we introduced the ability to deprecate models and fields.
In v2.8, we've extended this feature to include deprecation information in the JSON schema.
```py
from typing_extensions import deprecated
from pydantic import BaseModel, Field
@deprecated('DeprecatedModel is... sadly deprecated')
class DeprecatedModel(BaseModel):
deprecated_field: str = Field(..., deprecated=True)
json_schema = DeprecatedModel.schema()
assert json_schema['deprecated'] is True
assert json_schema['properties']['deprecated_field']['deprecated'] is True
```
### Programmatic Title Generation
This new feature allows you to generate titles for your models and fields with a callable. This is quite
helpful when you want to generate titles dynamically based on the model or field's attributes.
You can share callable title generators between models and fields, which helps keep your code
[DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
```py
import json
from pydantic import BaseModel, ConfigDict, Field
class MyModel(BaseModel):
foo: str = Field(..., field_title_generator=lambda name, field_info: f'title-{name}-from-field')
bar: str
model_config = ConfigDict(
field_title_generator=lambda name, field_info: f'title-{name}-from-config',
model_title_generator=lambda cls: f'title-{cls.__name__}-from-config',
)
print(json.dumps(MyModel.model_json_schema(), indent=2))
"""
{
"properties": {
"foo": {
"title": "title-foo-from-field",
"type": "string"
},
"bar": {
"title": "title-bar-from-config",
"type": "string"
},
},
"required": [
"foo",
"bar",
],
"title": "title-MyModel-from-config",
"type": "object"
}
"""
```
Want to learn more? Check out the [JSON schema customization docs](https://docs.pydantic.dev/2.8/concepts/json_schema/#using-field_title_generator).
### Serialization Context for `TypeAdapter`
In v2.7 we added support for passing context to serializers for `BaseModel`s.
In v2.8, we've extended that support to `TypeAdapter`'s serialization methods.
Here's a simple example, where we use a `unit` provided in the `context` to convert a `distance` field:
```py
from typing_extensions import Annotated
from pydantic import SerializationInfo, TypeAdapter, PlainSerializer
def serialize_distance(v: float, info: SerializationInfo) -> float:
"""We assume a distance is provided in meters, but we can convert it to other units if a context is provided."""
context = info.context
if context and 'unit' in context:
if context['unit'] == 'km':
v /= 1000 # convert to kilometers
elif context['unit'] == 'cm':
v *= 100 # convert to centimeters
return v
distance_adapter = TypeAdapter(Annotated[float, PlainSerializer(serialize_distance)])
print(distance_adapter.dump_python(500)) # no context, dumps in meters
# > 500.0
print(distance_adapter.dump_python(500, context={'unit': 'km'})) # with context, dumps in kilometers
# > 0.5
print(distance_adapter.dump_python(500, context={'unit': 'cm'})) # with context, dumps in centimeters
# > 50000
```
### Experimental Features
In v2.8.0, we introduced a new pattern for introducing experimental features and settings.
We've added a section to our [version policy](https://docs.pydantic.dev/2.8/version-policy/#experimental-features)
explaining how we'll handle experimental features moving forward.
You can find documentation for our new experimental features in the
[experimental features](https://docs.pydantic.dev/2.8/concepts/experimental/) section.
Experimental features will either be:
- Located in the `experimental` module, so you can import them via `from pydantic.experimental import ...`
- Prefixed with `experimental`, so you can use them like `some_func(experimental_param=...)` or `some_model.experimental_method(...)`
When you import an experimental feature from the `experimental` module, you'll see a `PydanticExperimentalWarning`. You can filter this via:
```py
import warnings
from pydantic import PydanticExperimentalWarning
warnings.filterwarnings('ignore', category=PydanticExperimentalWarning)
```
In our version policy, we also touch on the [lifecycle](https://docs.pydantic.dev/2.8/version-policy/#lifecycle-of-experimental-features)
of experimental features. It's very possible that experimental features will experience non-backward-compatible changes or be removed entirely
in future versions, so please be aware of their volatility when opting to use them.
#### Pipeline API — Experimental
Pydantic v2.8.0 introduced an experimental "pipeline" API that allows composing of parsing (validation),
constraints and transformations in a more type-safe manner than existing APIs.
Generally, the pipeline API is used to define a sequence of steps to apply to incoming data during validation.
The below example illustrates how you can use the pipeline API to validate an int first as a string, strip
extra whitespace, then parse it as an int, and finally ensure it's greater than or equal to 0.
```py
import warnings
from typing_extensions import Annotated
from pydantic import BaseModel, PydanticExperimentalWarning, ValidationError
warnings.filterwarnings('ignore', category=PydanticExperimentalWarning)
from pydantic.experimental.pipeline import validate_as
class Model(BaseModel):
data: Annotated[str, validate_as(str).str_strip().validate_as(...).ge(0)]
print(repr(Model(data=' 123 ')))
#> Model(data=123)
try:
Model(data=' -123 ')
except ValidationError as e:
print(e)
"""
1 validation error for Model
data
Input should be greater than or equal to 0 [type=greater_than_equal, input_value=' -123 ', input_type=str]
For further information visit https://errors.pydantic.dev/2.8/v/greater_than_equal
"""
```
Note, `validate_as(...)` is equivalent to `validate_as()`. So for the above example,
`validate_as(...)` is equivalent to `validate_as(int)`.
We've added some additional examples to the [pipeline docs](https://docs.pydantic.dev/2.8/concepts/experimental/#pipeline-api),
if you'd like to learn more.
#### `defer_build` support for `TypeAdapter — Experimental
Pydantic `BaseModel`s currently support a `defer_build` setting in their configuration, allowing
for deferred schema building (until the first validation call). This can help reduce startup time for applications that might have
an abundance of complex models that bear a heavy schema-building cost.
In v2.8, we've added experimental support for `defer_build` in `TypeAdapter`'s configuration.
Here's an example of how you can use this new experimental feature:
```py
from pydantic import ConfigDict, TypeAdapter
from typing import TypeAlias
# in practice, this would be some complex type for which schema building
# takes a while, hence the value of deferring the build
SuperDuperComplexType: TypeAlias = int
ta = TypeAdapter(
SuperDuperComplexType,
config=ConfigDict(
defer_build=True,
experimental_defer_build_mode=('model', 'type_adapter'),
),
)
assert ta._core_schema is None
print(ta.validate_python(0))
# after a call is made that requires the core schema, it's built and cached
assert ta.core_schema == {'type': 'int'}
```
## Performance Improvements
### `codeflash` Continuous Optimization
We've been working with the [`codeflash`](https://www.codeflash.ai/) team to make LLM driven optimizations to Pydantic's source code.
Thus far, we've made some minor optimizations in our internal logic, and we're looking forward to collaborating more significant improvements in the future.
We also hope to integrate `codeflash` into our CI/CD pipeline to ensure that we're consistently checking for optimization opportunities.
## Notable Improvements / Fixes
### Support for Python 3.13
Pydantic V2 now supports Python 3.13!
A few of our test-oriented dependencies are not yet compatible with Python 3.13, but we plan to upgrade them soon.
This means that not all tests can be run against Python 3.13 (there are just a few that aren't yet compatible), but this
should only affect contributors.
For users, Pydantic should work as expected with Python 3.13. If you run into any issues,
please [let us know](https://github.com/pydantic/pydantic/issues/new/choose)!
### Smarter `smart` `Union` Validation
When validating data against a union of types, Pydantic offers a
[`smart`](https://docs.pydantic.dev/2.8/concepts/unions/#smart-mode) mode and a
[`left_to_right`](https://docs.pydantic.dev/2.7/concepts/unions/#left-to-right-mode) mode.
We've made some improvements to the `smart` mode to improve behavior when validating against a union of types
that contain many of the same fields. The following example showcases an example of the old behavior vs the
new behavior:
```py
from pydantic import BaseModel, TypeAdapter
class ModelA(BaseModel):
a: int
class ModelB(ModelA):
b: str
ta = TypeAdapter(ModelA | ModelB)
print(repr(ta.validate_python({'a': 1, 'b': 'foo'})))
#> old behavior: ModelA(a=1)
#> new behavior: ModelB(a=1, b='foo')
```
There are many other more complex cases where the new behavior is more intuitive and correct. The general
idea is that we now incorporate the number of valid fields into the scoring algorithm, in addition to the
exactness (in terms of strict, lax, etc) of the match.
We have documented the match scoring [algorithm](https://docs.pydantic.dev/dev/concepts/unions/#smart-mode)
in order to provide transparency and predictability. However, we do reserve the right to modify the smart union algorithm in the future to further improve its behavior. In general, we expect such changes will only impact edge cases, and we'll only make such changes when they near-universally improve the handling of such edge cases.
Better understanding this algorithm can also help you choose which union mode is right for your use case.
If you're looking for more performant union validation than smart mode provides, we recommend that you use
[tagged unions](https://docs.pydantic.dev/2.8/concepts/unions/#discriminated-unions).
### Respect Regex Flags in Constrained String Validation
Python's `re` module supports flags that can be passed to regex patterns to change their behavior.
For example, the `re.IGNORECASE` flag makes a pattern case-insensitive. In previous versions of Pydantic,
even when using the `python-re` regex engine, these flags were ignored.
Now, we've improved Pydantic's constrained string validation by:
1. Not recompiling Python regex patterns (which is more performant)
2. Respecting the flags passed to a compiled pattern
!!! note
If you use a compiled regex pattern, the python-re engine will be used regardless of this setting.
This is so that flags such as `re.IGNORECASE` are respected.
Here's an example of said flags in action:
```py
import re
from typing_extensions import Annotated
from pydantic import BaseModel, ConfigDict, StringConstraints
class Model(BaseModel):
a: Annotated[str, StringConstraints(pattern=re.compile(r'[A-Z]+', re.IGNORECASE))]
model_config = ConfigDict(regex_engine='python-re')
# allows lowercase letters, even though the pattern is uppercase only due to the re.IGNORECASE flag
assert Model(a='abc').a == 'abc'
```
You can learn more about the `regex_engine` setting in our [model config docs](https://docs.pydantic.dev/2.8/api/config/#pydantic.config.ConfigDict.regex_engine)
## Conclusion
With these new features and performance improvements, Pydantic v2.8.0 is the best and most feature-rich version of Pydantic yet.
If you have any questions or feedback, please open a [Github discussion](https://github.com/pydantic/pydantic/discussions/new/choose).
If you encounter any bugs, please open a [Github issue](https://github.com/pydantic/pydantic/issues/new/choose).
Thank you to all of our contributors for making this release possible!
We would especially like to acknowledge the following individuals for their significant contributions to this release:
- [@Viicos](https://github.com/Viicos)
- [@MarkusSintonen](https://github.com/MarkusSintonen)
- [@NeevCohen](https://github.com/NeevCohen)
- [@uriyyo](https://github.com/uriyyo)
- [@josh-newman](https://github.com/josh-newman)
- [@nix010](https://github.com/nix010)
## Pydantic Logfire
If you're enjoying Pydantic, you might **really** like [Pydantic Logfire](https://pydantic.dev/logfire), a new observability tool
built by the team behind Pydantic. You can now [try logfire](https://logfire.pydantic.dev/login/) for free during our open beta period.
We'd love it if you'd join the [Pydantic Logfire Slack](https://logfire.pydantic.dev/docs/help/#:~:text=Pydantic%20Logfire%20Slack) and
let us know what you think!
9:T2d38,
Pydantic v2.7 is now available! This release is our biggest since v2.0, with a focus on performance improvements and highly requested new features.
This release also featured the work of over 30 new contributors! In this post, we'll cover the highlights of the release.
You can see the full changelog [here](https://github.com/pydantic/pydantic/releases/tag/v2.7.0).
## New Features
### Partial JSON parsing
Pydantic's [JSON parser](https://docs.rs/jiter/latest/jiter/) offers support for partial JSON parsing. This capability allows the parser to read input until it encounters invalid syntax, making a best-effort attempt to return a JSON object that accurately represents the valid portion of the input. Exposed via the [`from_json`](https://docs.pydantic.dev/2.7/api/pydantic_core/#pydantic_core.from_json) method, this feature is especially valuable for processing streaming outputs from Large Language Models (LLMs), which often generate partial JSON objects that traditional parsers cannot handle without errors.
One of the reasons that this feature is so helpful is that it's beneficial for validating LLM outputs.
In particular, LLMs often return a partial JSON object that's not syntactically correct JSON.
In the past, it wasn't possible to parse said response without a JSON parsing error.
Now, you can enable partial JSON parsing to parse the response, and then subsequently validate the parsed object against a Pydantic model
with [`model_validate`](https://docs.pydantic.dev/2.7/api/base_model/#pydantic.BaseModel.model_validate).
Here's a simple example:
```python
from pydantic_core import from_json
partial_json_data = '["aa", "bb", "c' # (1)!
try:
result = from_json(partial_json_data, allow_partial=False)
except ValueError as e:
print(e) # (2)!
#> EOF while parsing a string at line 1 column 15
result = from_json(partial_json_data, allow_partial=True)
print(result) # (3)!
#> ['aa', 'bb']
```
1. The JSON list is incomplete - it's missing a closing `"]`
2. When `allow_partial` is set to `False` (the default), a parsing error occurs.
3. When `allow_partial` is set to `True`, part of the input is deserialized successfully.
You can learn more about integrating Pydantic with your LLM work from some of our [blog posts](https://blog.pydantic.dev/articles/).
For more information, check out the [docs](https://docs.pydantic.dev/2.7/concepts/json/#partial-json-parsing) for this new feature!
### Generic `Secret` base type
Pydantic offers support for `SecretStr` and `SecretBytes` types, which are used to represent sensitive data.
We've extended this support to include a generic `Secret` base type, which can be used to create custom secret types.
For example, you could create a `SecretSalary` type that wraps an integer salary value and
and customizes the display of the secret value like so:
```py
from datetime import date
from pydantic import BaseModel, Secret
class SecretSalary(Secret[int]):
def _display(self) -> str:
return '$******'
class Employee(BaseModel):
name: str
salary: SecretSalary
employee = Employee(name='John Doe', salary=100_000)
print(repr(employee))
#> Employee(name='John Doe', salary=SecretSalary('$******'))
print(employee.salary)
#> $******
print(employee.salary.get_secret_value())
#> 100000
```
If you're satisfied with a more generalized `repr` output, you can use this even more concise version,
where the `Secret` type is directly parametrized with no need for the subclass:
```py
from typing_extensions import Annotated
from pydantic import Secret, TypeAdapter
ta = TypeAdapter(Secret[int])
my_secret_int = ta.validate_python(123)
print(my_secret_int)
#> **********
print(my_secret_int.get_secret_value())
#> 123
```
This feature is incredibly extensible and can be used to create custom secret types for a wide variety of base types.
Explore the [usage docs](https://docs.pydantic.dev/2.7/examples/secrets/#create-your-own-secret-field) to learn more!
### `deprecated` fields
One of the most [highly requested](https://github.com/pydantic/pydantic/issues/2255) features in Pydantic (ever)
is the ability to mark fields as deprecated. Thanks to the hard work of [@Viicos](https://github.com/Viicos), this feature
has been realized!
Marking a field as `deprecated` will result in:
1. A runtime deprecation warning emitted when accessing the field
2. The `deprecated` parameter being set to `true` in the generated JSON schema
The `deprecated` field can be set to any of:
- A string, which will be used as the deprecation message.
- An instance of the `warnings.deprecated` decorator (or the `typing_extensions` backport).
- A boolean, which will be used to mark the field as deprecated with a default 'deprecated' deprecation message.
Here's a simple example:
```py
from pydantic import BaseModel, Field
class Model(BaseModel):
deprecated_field: int = Field(deprecated=True)
print(Model.model_json_schema()['properties']['deprecated_field'])
#> {'deprecated': True, 'title': 'Deprecated Field', 'type': 'integer'}
```
The [docs](https://docs.pydantic.dev/2.7/concepts/fields/#deprecated-fields) for this feature delve into more details
about the various ways to mark and customize deprecated fields.
### `serialize_as_any` runtime setting
In v1, Pydantic used serialization with duck-typing by default. In an attempt to improve security,
Pydantic v2 switched away from this approach.
:::note{title="What is serialization with duck typing?"}
Duck-typing serialization is the behavior of serializing an object based on the fields present in the object itself, rather than the fields present in the schema of the object. This means that when an object is serialized, fields present in a subclass, but not in the original schema, will be included in the serialized output.
:::
In Pydantic v2.7, we've reintroduced serialization with duck typing as an opt-in feature via a new `serialize_as_any` runtime flag.
This opt in feature was available in previous v2.X versions via the `SerializeAsAny` annotation, but that required
annotating each field individually. The new `serialize_as_any` flag allows you to enable duck-typing serialization
for all fields in a model with a single flag.
Here's an example showcasing the basic usage of the setting:
```py
from pydantic import BaseModel, TypeAdapter
class User(BaseModel):
name: str
class UserLogin(User):
password: str
ta = TypeAdapter(User)
user_login = UserLogin(name='John Doe', password='some secret')
print(ta.dump_python(user_login, serialize_as_any=False)) # (1)!
#> {'name': 'John Doe'}
print(ta.dump_python(user_login, serialize_as_any=True)) # (2)!
#> {'name': 'John Doe', 'password': 'some secret'}
```
1. This is the default behavior - fields not present in the schema are not serialized.
2. With `serialize_as_any` set to `True`, fields not present in the schema are serialized.
We've upgraded the documentation for [serialization with duck typing](https://docs.pydantic.dev/2.7/concepts/serialization/#serializing-with-duck-typing).
[This section](https://docs.pydantic.dev/dev/concepts/serialization/#serialize_as_any-runtime-setting), in particular,
covers the new `serialize_as_any` runtime flag.
### Pass context to serialization
Pydantic previously supported `context` in validation, but not in serialization. With the help of
[@ornariece](https://github.com/ornariece), we've added support for using a `context` object during serialization.
Here's a simple example, where we use a `unit` provided in the `context` to convert a `distance` field:
```py
from pydantic import BaseModel, SerializationInfo, field_serializer
class Measurement(BaseModel):
distance: float # in meters
@field_serializer('distance')
def convert_units(self, v: float, info: SerializationInfo):
context = info.context
if context and 'unit' in context:
if context['unit'] == 'km':
v /= 1000 # convert to kilometers
elif context['unit'] == 'cm':
v *= 100 # convert to centimeters
return v
measurement = Measurement(distance=500)
print(measurement.model_dump()) # no context
#> {'distance': 500.0}
print(measurement.model_dump(context={'unit': 'km'})) # with context
#> {'distance': 0.5}
print(measurement.model_dump(context={'unit': 'cm'})) # with context
#> {'distance': 50000.0}
```
This feature is powerful as it further extends Pydantic's flexibility and customization capabilities
when it comes to serialization.
See the [documentation](https://docs.pydantic.dev/2.7/concepts/serialization/#serialization-context) for more information.
## Performance Improvements
### PyO3 0.21
Pydantic uses PyO3 to connect our core Rust code to Python. This under the hood upgrade brings a significant performance improvement to Pydantic,
as seen in these [benchmarks](https://github.com/pydantic/pydantic-core/pull/1222#issuecomment-1992161534).
For detailed information on the improvements and changes in PyO3 0.21, check out this [blog post](https://polar.sh/davidhewitt/posts/replacing-pyo3-api-pt1)
from [David Hewitt](https://github.com/davidhewitt), a Rust 🤝 Python expert!
### SIMD integer and string JSON parsing on `aarch64`
Pydantic now uses SIMD instructions for integer and string JSON parsing on `aarch64` (ARM) platforms.
:::note
SIMD on `x86` will be implemented in a future release!
:::
### Faster `enum` validation and serialization
`enum` validation and serialization logic was moved to `pydantic-core`, which is written in Rust.
This migration results in a ~4x speedup for `enum` validation and serialization.
### Fast path for ASCII python string creation in JSON
`jiter`, Pydantic's JSON parser, now has a [fast path](https://github.com/pydantic/jiter/pull/72) for creating ASCII Python strings. This change results in a ~15% performance improvement for Python string parsing.
### Caching Python strings
Pydantic's [JSON parser](https://docs.rs/jiter/latest/jiter/) offers support for configuring how Python strings are
cached during JSON parsing and validation. Memory usage increases slightly when caching strings, but it can improve
performance significantly, especially in cases where certain strings are repeated frequently.
The `cache_strings` setting (in [model config](https://docs.pydantic.dev/2.7/api/config/)
or as an argument to [`from_json`](https://docs.pydantic.dev/2.7/api/pydantic_core/#pydantic_core.from_json))
can take any of the following values:
- `True` or `'all'` (the default): cache all strings
- `'keys'`: cache only dictionary keys
- `False` or `'none'`: no caching
:::note
The `'keys'` setting **only** applies when used with [`from_json`](https://docs.pydantic.dev/2.7/api/pydantic_core/#pydantic_core.from_json) or when parsing JSON using [`Json`][pydantic.types.Json]. `True` or `'all'` is required to cache strings during general validation because validators don't know if they're processing a key or a value.
:::
Learn more about this feature [here](https://docs.pydantic.dev/2.7/concepts/json/#caching-strings).
## Conclusion
With these new features and performance improvements, Pydantic v2.7 is the fastest and most feature-rich version of Pydantic yet.
If you have any questions or feedback, please open a [Github discussion](https://github.com/pydantic/pydantic/discussions/new/choose).
If you encounter any bugs, please open a [Github issue](https://github.com/pydantic/pydantic/issues/new/choose).
a:T59f5,
[AWS Lambda](https://aws.amazon.com/pm/lambda/) is a popular serverless computing service that allows developers to run code without provisioning or managing servers. This service is so widely used because it supports automatic scaling and offers a cost-effective pay-per-call pricing model.
AWS Lambda functions can be triggered by various AWS services and other event sources, which pass `event` and `context` data to said function. Like any other application, it's critical to structure and validate this incoming data to ensure proper execution of the function and reliability of the results.
In this article, we'll explore how Pydantic, the leading data validation library for Python, can be leveraged to structure and validate `event` and `context` data in AWS Lambda functions. We'll discuss the importance of understanding the structure of `event` and `context` data, and how Pydantic can help enhance developer experience by improving readability and maintainability of Lambda functions.
:::note{title="Setting up an AWS Lambda Function"}
Throughout this article, we will refer to simple examples of Lambda functions written in Python. Our focus will be on showcasing the benefits of using Pydantic with Lambda, with simple local tests. A detailed walkthrough of setting up an AWS Lambda function is beyond the scope of this article.
:::
For comprehensive instructions on setting up an AWS Lambda function, refer to [the official guide](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html). This resource provides a step-by-step tutorial on how to creating and testing a function via the AWS Management Console. The guide also provides links to more advanced topics such as trigger configuration and monitoring / logging.
:::commend{title="Why Should I Care?"}
Without proper validation of incoming data, Lambda functions can be prone to errors, and even security vulnerabilities.
:::
By using Pydantic to structure and validate the `event` and `context` data, one can enhance the developer experience by improving type-hinting and autocompletion, generating automatic documentation, and enhancing debuggability with straightforward and comprehensive error messages.
Early validation with Pydantic also facilitates runtime improvements, such as faster failure for invalid inputs, reduced load and execution costs, and improved security against malicious incoming data.
## A Simple Example
First, let's take a closer look at AWS Lambda and the data that is passed into a Lambda function when it is invoked.
When a Lambda function is invoked, it receives two parameters: `event` and `context`.
The `event` parameter contains the data that is passed into the function, while the `context` parameter provides information about the invocation, function, and execution environment. The `event` and `context` parameters are both dictionaries. We will soon see that we can validate the contents of these dictionaries with Pydantic.
### Without Pydantic
Let's consider a simple example of a Lambda function that receives a user sign-up event. The `event` data contains:
- `name` (str): The first and last name of the user.
- `birthday` (date): The user's date of birth.
- `email` (str): The user's email address.
We'll work with a basic Lambda function that processes this event, calculates the user's age, and returns a success response.
Here's the Lambda function without Pydantic validation:
```py
from datetime import date, datetime
def lambda_handler(event: dict, context: dict) -> dict:
name = event["name"]
birthday = datetime.strptime(event["birthday"], "%Y-%m-%d").date()
email = event["email"]
age = (date.today() - birthday).days // 365
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": {
"name": name,
"birthday": birthday.strftime("%Y-%m-%d"),
"email": email,
"age": age
},
"request_id": context.aws_request_id,
}
```
:::note
In practice, a Lambda function would probably not be used for such a simple task.
However, using such a simple example helps cleanly illustrate the benefits of using Pydantic for data validation.
:::
Lambda functions are typically invoked by sending a web request to a configured endpoint. The service calling the Lambda function passes the `event` and `context` data to the function. This is effectively equivalent to invoking the function directly with the `event` and `context` data as arguments, which, for simplicity, is what we'll do in the following examples. [Later in the article](#application-invoking-a-lambda-with-the-aws-cli), we show how to invoke a Lambda function using the AWS CLI.
More concretely, the following script is representative of what happens when the Lambda service invokes the function:
```py
import json # (1)!
event = {
"name": "Alice",
"birthday": "1990-01-01",
"email": "alice@gmail.com"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "alice@gmail.com",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
```
1. For all future invocation examples, we will use the `json` module to pretty-print the output of the Lambda function for better readability. You can assume that this import is present in all future examples.
What could go wrong here? **Lots of things.** To name a few:
1. The `event` data might be missing required fields.
2. The `event` data might contain fields with incorrect types or formats (e.g., what happens if `birthday` is not a date?).
3. The `event` data might contain fields with invalid values (e.g., what happens if `birthday` is in the future?).
To address these issues, we can use Pydantic to define models that represent the structure of the `event` and `context` data,
and validate the incoming data before processing it in the Lambda function.
### With Pydantic
:::note
The AWS Lambda context object is a complex object with many attributes. For the purposes of this article, we will focus on just the `aws_request_id`, but the same principles can be applied to any of the other attributes of the `context` object. You can read more about the `context` object attributes [here](https://docs.aws.amazon.com/lambda/latest/dg/python-context.html).
:::
```py
from datetime import date
from pydantic import BaseModel, ValidationError, computed_field
class UserSignUpEvent(BaseModel):
name: str
birthday: date
email: str
@computed_field
@property
def age(self) -> int: # (1)!
return (date.today() - self.birthday).days // 365
class Context(BaseModel):
aws_request_id: str # (2)!
def lambda_handler(event: dict, context: dict) -> dict:
try:
user = UserSignUpEvent.model_validate(event)
context_data = Context.model_validate(context)
except ValidationError as e:
return {"result": "error", "message": e.errors(include_url=False)} # (3)!
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": user.model_dump(mode="json"), # (4)!
"request_id": context_data.aws_request_id,
}
```
1. Pydantic offers a `@computed_field` decorator that allows us to define a property that is computed based on other fields in the model. In this case, we use it to calculate the user's age based on their birthday.
2. Pydantic models have the `extra` setting set to `ignore` by default, which is why we can selectively define only the attributes we care about in the `Context` model.
3. We exclude the URL from the error messages to keep them concise and readable.
4. We use the `model_validate` method to validate the `event` and `context` data against their corresponding Pydantic models. If the `event` data is invalid, a `ValidationError` will be raised, and the function will fail early with a descriptive error response.
:::commend{title="Why do we use `model_dump` in the response?"}
We use `model_dump` to serialize the `event` data to a **json-able** dictionary. We return the validated `event` data in the response. This is a good way to provide feedback to the caller about the data that was processed, and how it was transformed.
:::
Let's look at a sample invocation of the Lambda function with Pydantic validation:
```py
event = {
"name": "Alice",
"birthday": "1990-01-01", # (1)!
"email": "alice@gmail.com"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "alice@gmail.com",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
```
1. In this invocation, we pass the `birthday` as a string. Pydantic will automatically parse the string into a `date` object, so the function will process the data successfully.
As we'd expect, the function processes the data successfully and returns a success response (these results are identical to that of the original function, without Pydantic validation).
However, where Pydantic shines is when the incoming data is invalid.
Consider the following invocation, with incomplete `event` data:
```py
event = {
"name": "Alice",
"birthday": "1990-01-01",
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
}
]
}
"""
```
As you can see, Pydantic provides the caller with detailed information about the missing `email` field in the `event` data. This is a significant improvement over the original function, which would have raised an error, only accessible from deep within the Lambda's logs. No easy-to-understand error message would have been returned to the caller in the case of the original function. You can see what I mean [here](#putting-it-all-together).
:::commend
Pydantic offers runtime improvements by validating the incoming data before processing it in the Lambda function. This can lead to faster failure for invalid inputs, thus reducing load and lowering execution costs.
:::
Alternatively, consider the following invocation, where `birthday` is not a valid date (there's no February 31st):
```py
event = {
"name": "Alice",
"birthday": "1990-02-31",
"email": "alice@gmail.com"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "date_from_datetime_parsing",
"loc": [
"birthday"
],
"msg": "Input should be a valid date or datetime, day value is outside expected range",
"input": "1990-02-31",
"ctx": {
"error": "day value is outside expected range"
}
}
]
}
"""
```
:::commend
In this example, we use snake case for consistency with the Python code. In practice, it's common to use camelCase or PascalCase for the field names in a JSON response. One way to do this easily, while maintaining snake_case variable names in your models, is to use the `serialization_alias` field setting in Pydantic models. You can read more about that [here](https://docs.pydantic.dev/latest/concepts/alias/).
:::
This is just the beginning of what Pydantic can do for your Lambda functions.
#### Upgrade 1: Using the `validate_call` decorator
In the previous example, we used the `model_validate` method to validate the `event` and `context` data. Pydantic also provides a `validate_call` decorator that can be used to validate the arguments of a function. This decorator can be used to validate the `event` and `context` data directly in the function signature, like this:
```py hl_lines="4"
from pydantic import validate_call
@validate_call
def lambda_handler_inner(event: UserSignUpEvent, context: Context) -> dict:
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": event.model_dump(mode="json"),
"request_id": context.aws_request_id,
}
def lambda_handler(event: dict, context: dict) -> dict:
try:
response = lambda_handler_inner(event, context)
return response
except ValidationError as e:
return {"result": "error", "message": e.errors(include_url=False)}
```
This approach allows us to catch any validation errors associated with the `event` and `context` data together, and removes the need to explicitly validate the data in the function body.
Here's an example of what an error response might look like when using the `validate_call` decorator:
```py
event = { # (1)!
"name": "Alice",
"birthday": "1990-01-01",
}
context = {} # (2)!
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
0,
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
},
{
"type": "missing",
"loc": [
1,
"aws_request_id"
],
"msg": "Field required",
"input": {}
}
]
}
"""
```
1. In this invocation, the `email` field is missing from the `event` data.
2. The `aws_request_id` field is missing from the `context` data.
This result showcases the implicit validation of the `event` and `context` data in the function signature, and the detailed error messages that are returned when the data (for both) is invalid.
#### Upgrade 2: Enhancing `birthday` validation
In the previous examples, we used a `date` field to represent the `birthday` data in the `event` model. Pydantic provides specialized field types that can be used to enhance the validation of the data. For example, we can use the `PastDate` field type to represent the `birthday` data, and provide additional validation logic to ensure that the date is in the past (we can't have users signing up with future birthdays).
If we define the `UserSignUpEvent` model like this:
```py hl_lines="8"
from datetime import date
from pydantic import BaseModel, PastDate
class UserSignUpEvent(BaseModel):
name: str
birthday: PastDate
email: str
@computed_field
@property
def age(self) -> int:
return (date.today() - self.birthday).days // 365
```
We can now validate the `birthday` data to ensure that it is a valid date and that it is in the past. Here's an example of what an error response might look like when the `birthday` data is in the future:
```py hl_lines="19"
event = {
"name": "Alice",
"birthday": "2090-01-01",
"email": "alice@gmail.com"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "date_past",
"loc": [
"birthday"
],
"msg": "Date should be in the past",
"input": "2090-01-01"
}
]
}
"""
```
#### Upgrade 3: Customizing `name` validation
You can also customize the validation logic for a field by defining a custom validator function. For example, we can define a custom validator function to ensure that the `name` field contains both a first and last name, and then title case the result.
For example:
```py hl_lines="13 14 15 16 17 18 19"
from pydantic import BaseModel, field_validator
class UserSignUpEvent(BaseModel):
name: str
birthday: date
email: str
@computed_field
@property
def age(self) -> int:
return (date.today() - self.birthday).days // 365
@field_validator('name')
@classmethod
def name_has_first_and_last(cls, v: str) -> str:
stripped_name = v.strip()
if ' ' not in stripped_name:
raise ValueError('`name` must contain first and last name, got {v}')
return stripped_name.title()
```
:::note
In practice, you might want to use a more sophisticated validation function to ensure that a
[name is valid](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/).
We use a simple example here to illustrate the value of functional field validators.
:::
For a valid `name` field, we can see that the name is title-cased:
```py
event = {
"name": "alice smith",
"birthday": "1990-01-01",
"email": "alice@gmail.com"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice Smith",
"birthday": "1990-01-01",
"email": "alice@gmail.com",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
```
As you can imagine, if the `name` field is missing a last name, the function will raise a descriptive error.
## Application: Invoking a Lambda with the AWS CLI
Thus far, we've been invoking the Lambda function directly in Python. In practice, Lambda functions are typically invoked by other services, such as API Gateway, S3, or SNS.
The method of invocation will depend on your specific use case and requirements. We'll demonstrate how to invoke the Lambda function using the AWS CLI, which is a common way to test Lambda functions locally.
:::assert
In the previous examples, for simplicity, we've omitted status codes and other response metadata that would typically be included in a the response payload. In practice, you might want to include additional metadata in the response to provide more context to the caller about the outcome of the function execution. For example, if you're [exposing the Lambda function over HTTP](https://docs.aws.amazon.com/apigateway/latest/developerguide/handle-errors-in-lambda-integration.html), you might want to include a status code in the response to indicate the outcome of the request.
:::
To invoke this Lambda function with the AWS CLI, you can use the `aws lambda invoke` command:
```sh
aws lambda invoke \
--function-name my-function \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01", "email": "alice@gmail.com"}' \
output.json
```
:::assert
You could also append `&& cat output.json` to the end of the command to print the output to the console.
:::
This command assumes that you have the [AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and [configured with the appropriate credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html). It also assumes that you've configured your Lambda function with the name `my-function`. The `--payload` option is used to pass the `event` data to the Lambda function, and the output of the function will be written to the `output.json` file.
If we pass in the valid `event` data used above, we see following in the `output.json` file:
```json
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "alice@gmail.com",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
```
Similarly, if we invoke our lambda with an invalid payload, we can expect the `output.json` file to be populated with a detailed error response.
### Putting it All Together
Here, we can see the concrete benefits of invoking a Lambda function with Pydantic compared to invoking a Lambda function without Pydantic, using the AWS CLI.
Consider this invocation:
```sh
aws lambda invoke \
--function-name my-lambda \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01"}'
output.json && cat output.json
```
=== "Original Lambda, without Pydantic 🙁"
Console output:
```sh
{
"StatusCode": 200, # (1)!
"FunctionError": "Unhandled",
"ExecutedVersion": "$LATEST"
}
```
1. This 200 status code indicates that the function was invoked successfully. That said, the `FunctionError` field indicates that an unhandled error occurred during the function execution.
=== "New and Improved Lambda, with Pydantic 🚀"
Console output:
```sh
{
"StatusCode": 200, # (1)!
"ExecutedVersion": "$LATEST"
}
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
}
]
}
```
1. This 200 status code indicates that the function was invoked successfully.
The response payload contains a detailed error message that explains what went wrong with the input data.
The response from the original Lambda function is unhelpful and doesn't provide any information about what went wrong.
In order to debug the issue, you would need to dig into the logs in the AWS management console.
On the other hand, the response from the Lambda function with Pydantic validation is clear and concise. It provides detailed information about the missing `email` field in the `event` data, making it easy to identify and fix the issue.
## Concluding Thoughts
In this article we demonstrated that Pydantic is a powerful tool for structuring and validating `event` and `context` data in AWS Lambda functions. By utilizing Pydantic, developers can improve the developer experience and runtime performance of their Lambda functions.
We encourage developers to adopt Pydantic as a best practice when developing AWS Lambda functions. Integrating Pydantic into your Lambda functions can be a game-changer, enhancing your code's readability, maintainability, and efficiency.
### What's Next?
If you're interested in further exploring the integration capabilities between Pydantic and AWS Lambda, consider the following next steps:
1. Use [`pydantic-settings`](https://docs.pydantic.dev/latest/api/pydantic_settings/) to manage environment variables in your Lambda functions.
2. Take a deep dive into Pydantic's more advanced features, like custom [validation](https://docs.pydantic.dev/latest/concepts/validators/#annotated-validators) and [serialization](https://docs.pydantic.dev/latest/concepts/serialization/#custom-serializers) to transform your Lambda's data.
3. Explore creating a Pydantic [Lambda Layer](https://docs.aws.amazon.com/lambda/latest/dg/chapter-layers.html) to share the Pydantic library across multiple Lambda functions.
4. Take a look at more Pydantic custom types, like `NameEmail`, `SecretStr`, and many [others](https://docs.pydantic.dev/dev/api/types/#pydantic.types).
b:T27c4,
In previous blog posts, we showed that [Pydantic is well suited to steering language models](./llm-intro) and [validating their outputs](./llm-validation).
The application of Pydantic extends beyond merely managing outputs of these text-based models. In this post, we present a guide on how to develop a product search API that uses Pydantic as a link between GPT-4 Vision and FastAPI. Pydantic will be used to structure both the data extraction processes as well as FastAPI requests and responses.
The combination of Pydantic, FastAPI, and OpenAI's GPT models creates a powerful stack for the development of AI applications, characterized by:
- **Pydantic's Schema Validation:** This feature guarantees the uniformity and adherence to predefined schemas across the application, an essential factor for managing outputs from AI models.
- **FastAPI's Performance and Ease of Use:** FastAPI serves as the optimal framework for crafting responsive APIs that can fulfill the requirements of AI applications. This is further enhanced by its seamless integration with Pydantic, which aids in data validation and serialization.
- **OpenAI's GPT-4 Vision Capabilities:** The inclusion of GPT-4 Vision introduces a layer of advanced AI intelligence, empowering applications with the ability to accurately interpret and analyze visual data.
:::note{title="What is FastAPI"}
FastAPI is a high-performance web framework ideal for building APIs, known for its simplicity and ease of learning. It integrates seamlessly with Pydantic, allowing for the validation and consistency of data across an application. This integration also facilitates the automatic generation of API documentation, including schema and examples.
:::
## Example: Ecommerce Vision API
We will develop a straightforward e-commerce vision application. Users will upload an image for processing, and the results could be forwarded to a product search API to fetch supplementary results. This functionality could enhance accessibility, boost user engagement, and potentially increase conversion rates. For the moment, however, our primary focus will be on data extraction.
```python
from typing import List
from pydantic import BaseModel, Field
class SearchQuery(BaseModel): # (1)!
product_name: str
query: str = Field(
...,
description="""A descriptive query to search for the product, include
adjectives, and the product type. will be used to serve relevant
products to the user.""",
)
class MultiSearchQueryResponse(BaseModel): # (2)!
products: List[SearchQuery]
model_config = ConfigDict( # (3)!
json_schema_extra={
"examples": [
{
"products": [
{
"product_name": "Nike Air Max",
"query": "black running shoes",
},
{
"product_name": "Apple iPhone 13",
"query": "smartphone with best camera",
},
]
}
]
}
```
1. The `SearchQuery` model is introduced to encapsulate a single product and its associated search query. Through the use of Pydantic's `Field`, a description is added to the `query` field to facilitate prompting the language model
2. The `MultiSearchQueryResponse` model is created to encapsulate the API's response, comprising a list of `SearchQuery` objects. This model serves as the representation of the response from the language model.
3. We define a `model_config` dictionary to define the schema and examples for the `MultiSearchQueryResponse` model. This will be used to generate the API documentation and will also be included in the OpenAI prompt.
This output format not only guides the language model and outlines our API's response schema but also facilitates the generation of API documentation. Utilizing `json_schema_extra` allows us to specify examples for both documentation and the OpenAI prompt.
## Crafting the FastAPI Application
After establishing our models, it's time to leverage them in crafting the request and response structure of our FastAPI application. To interacte with the GPT-4 Vision API, we will use the async OpenAI Python client.
````python
from openai import AsyncOpenAI
from fastapi import FastAPI
client = AsyncOpenAI()
app = FastAPI(
title="Ecommerce Vision API",
description="""A FastAPI application to extract products
from images and describe them as an array of queries""",
version="0.1.0",
)
class ImageRequest(BaseModel): #(1)!
url: str
temperature: float = 0.0
max_tokens: int = 1800
model_config = ConfigDict(
json_schema_extra={
"examples": [
{
"url": "https://mensfashionpostingcom.files.wordpress.com/2020/03/fbe79-img_5052.jpg?w=768",
"temperature": 0.0,
"max_tokens": 1800,
}
]
}
)
@app.post("/api/extract_products", response_model=MultiSearchQueryResponse) #(2)!
async def extract_products(image_request: ImageRequest) -> MultiSearchQueryResponse: #(3)!
completion = await client.chat.completions.create(
model="gpt-4-vision-preview", #(4)!
max_tokens=request.max_tokens,
temperature=request.temperature,
stop=["```"],
messages=[
{
"role": "system",
"content": f"""
You are an expert system designed to extract products from images for
an ecommerce application. Please provide the product name and a
descriptive query to search for the product. Accurately identify every
product in an image and provide a descriptive query to search for the
product. You just return a correctly formatted JSON object with the
product name and query for each product in the image and follows the
schema below:
JSON Schema:
{MultiSearchQueryResponse.model_json_schema()}""", #(5)!
},
{
"role": "user",
"content": [
{
"type": "text",
"text": """Extract the products from the image,
and describe them in a query in JSON format""",
},
{
"type": "image_url",
"image_url": {"url": request.url},
},
],
},
{
"role": "assistant",
"content": "```json", #(6)!
},
],
)
return MultiSearchQueryResponse.model_validate_json(completion.choices[0].message.content)
````
1. The `ImageRequest` model is crafted to encapsulate the request details for the `/api/extract_products` endpoint. It includes essential parameters such as the image URL for product extraction, alongside `temperature` and `max_tokens` settings to fine-tune the language model's operation.
2. The `/api/extract_products` endpoint is established to process requests encapsulated by the `ImageRequest` model and to return a `MultiSearchQueryResponse` response. The `response_model` attribute is utilized to enforce response validation and to facilitate the automatic generation of API documentation.
3. A dedicated function is implemented to manage requests to the `/api/extract_products` endpoint. This function accepts an `ImageRequest` as its input and produces a `MultiSearchQueryResponse` as its output, effectively bridging the request and response phases.
4. Interaction with the GPT-4 Vision API is facilitated through the OpenAI Python client, employing the `gpt-4-vision-preview` model for the purpose of extracting product details from the provided image.
5. The `MultiSearchQueryResponse` model's `model_json_schema` method is employed to construct the JSON schema that will be included in the prompt sent to the language model. This schema guides the language model in generating appropriately structured responses.
6. To enhance the likelihood of receiving well-structured responses, the assistant is prompted to initiate its reply with `json`, setting a clear expectation for the format of the output.
:::note{title="Why don't we use function calling?"}
In our first post on [structured outputs with pydantic](./llm-intro#calling-tools) we discussed using Function Calling and Tools calling to get structured data out, as of March 4th 2024, the `gpt-4-vision-preview` mode does not currently support function calling.
As a result, we must rely on generating structured outputs through carefully crafted prompts and then manually or programmatically interpret these outputs using `BaseModel.model_validate_json()`
:::
## Running the FastAPI application
To run the FastAPI application, we can use the `uvicorn` command-line tool. We can run the following command to start the application:
```
uvicorn app:app --reload
```
## Visiting the documentation
Once the application is running, we can visit the `/docs` endpoint at `localhost:8000/docs`, and you'll notice that the documentation and examples are automatically generated as part of the `Example Value`
![](/assets/blog/llm-vision/vision_docs.png)
## Testing the API
Once you hit 'Try it out' and 'Execute' you'll see the response from the language model, you'll see that the response is formatted according to the `MultiSearchQueryResponse` model we defined earlier.
![](/assets/blog/llm-vision/vision_example.png)
## Future of AI Engineering
With the increasing availability of language models that offer JSON output, Pydantic is emerging as a crucial tool in the AI Engineering toolkit. It has demonstrated its utility in modeling data for extraction, handling requests, and managing responses, which are essential for deploying FastAPI applications. This underscores Pydantic's role as an invaluable asset for developing AI-powered web applications in Python.
c:T24f2,
In our previous post we introduced Pydantic as a tool to [steer language models](./llm-intro).
This post, however, shifts focus on how we can leverage Pydantic's [validation](https://docs.pydantic.dev/latest/concepts/validators/) mechanism to minimize hallucinations. We'll explain how validation works and explore how incorporating context into validators can enrich language model result.
The intention is by the end of this article, you'll see some examples of how we can use Pydantic to minimize hallucinations and gain more confidence in the model's output.
But before we do that, lets go over some validation basics.
:::note
For a deep dive into Pydantic's validation mechanics, visit the [official documentation](https://docs.pydantic.dev/latest/concepts/validators/).
:::
## Introduction to Validators
Validators are functions that take a value, check a property, raise an error, and return a value. They can be used to enforce constraints on model inputs and outputs.
```python
def validation_function(value):
if condition(value):
raise ValueError("Value is not valid")
return mutation(value)
```
For instance, consider validating a name field. Here’s how you can enforce a space in the name using `Annotated` and `AfterValidator`:
```python hl_lines="13"
from typing_extensions import Annotated
from pydantic import BaseModel, ValidationError, AfterValidator
def name_must_contain_space(v: str) -> str:
if " " not in v:
raise ValueError("Name must contain a space.")
return v.lower()
class UserDetail(BaseModel):
age: int
name: Annotated[str, AfterValidator(name_must_contain_space)] #(1)!
person = UserDetail.model_validate({"age": 24, "name": "Jason"}) #(2)!
```
1. `AfterValidator` applies a custom validation via `Annotated`.
2. The absence of a space in 'Jason' triggers a validation error.
:::deter{title="Validation Errors"}
If we run the above code, we'll get the following error:
```
ValidationError: 1 validation error for UserDetail
name
Value error, Name must contain a space. [type=value_error, input_value='Jason', input_type=str]
More at https://errors.pydantic.dev/2.4/v/value_error
```
:::
### Context-Driven Validators
Validators can also be used to enforce context-specific constraints. For instance, consider a validator that checks if a name is in a list of names, and raises an error if it isn't. Enhancing validators with `ValidationInfo` adds nuanced control. For example, removing dynamic stopwords from a text requires us to pass in some context:
```python
def remove_stopwords(v: str, info: ValidationInfo):
context = info.context
if context:
stopwords = context.get('stopwords', set())
v = ' '.join(w for w in v.split() if w.lower() not in stopwords)
return v
class Response(BaseModel):
message: Annotated[str, AfterValidator(remove_stopwords)]
```
Passing dynamic context to the validator:
```python
data = {'text': 'This is an example document'}
print(Model.model_validate(data)) # Without context #(1)!
#> text='This is an example document'
print(Model.model_validate(
data, context={
'stopwords': ['this', 'is', 'an'] #(2)!
}))
#> text='example document'
```
1. Without context, the validator does nothing.
2. Passing context removes stopwords from the text.
:::note{title="LLMs and Validators"}
Validator context becomes crucial in directing language models. It helps in excluding specific content (like competitor names) or focusing on relevant topics, which is particularly effective in question-answering scenarios.
:::
## Using LLMs with Pydantic
Now lets revisit the [instructor](https://jxnl.github.io/instructor/) package from our previous [article](./llm-intro), which employs Pydantic to control language output.
1. `response_model`: already seen in the previous article.
2. `validation_context`: similar to `ValidationInfo`, provides validator context, that can be used augment the validation process.
3. `llm_validation`: a validator that uses an LLM to validate the output.
### Validation using an LLM
Some rules are easier to express using natural language. For instance, consider the following rule: 'don't say objectionable things'. This rule is difficult to express using a validator function, but easy to express using natural language. We can use an LLM to generate a validator function from this rule.
Consider this example where we want some light moderation on a question answering model. We want to ensure that the answer does not contain objectionable content. We can use an LLM to generate a validator function that checks if the answer contains objectionable content.
:::deter{title="Network Requests in Validators"}
This will include a network request in your synchronous validators, which merits a call out about the performance consequences (e.g., if you were to use this model as an input to a **FastAPI** endpoint). If you have a blocking network request in model validation, the server will be totally unresponsive while waiting for the API response.
:::
```python
import instructor
from openai import OpenAI
from instructor import llm_validator
from pydantic import BaseModel, BeforeValidator
from typing_extensions import Annotated
client = instructor.patch(OpenAI())
NoEvil = Annotated[
str,
BeforeValidator(
llm_validator("don't say objectionable things", openai_client=client)
)]
class QuestionAnswer(BaseModel):
question: str
answer: NoEvil
QuestionAnswer.model_validate({
"question": "What is the meaning of life?",
"answer": "Sex, drugs, and rock'n roll"
})
```
The above code will fail with the following error:
```
1 validation error for QuestionAnswer
answer
Assertion failed, The statement promotes objectionable behavior. [type=assertion_error, input_value='The meaning of life is to be evil and steal', input_type=str]
Details at https://errors.pydantic.dev/2.4/v/assertion_error
```
Notice how the error message is generated by the LLM.
### Grounding responses in context
Many organizations worry about hallucinations in their llm responses. To address this we can use validators to ensure that the model's responses are grounded in the context used to generate the prompt.
:::assert{title="What is a hallucination?"}
A hallucination is when a language model generates a response that is not grounded in the context used to generate the prompt. This could mean an answer is incorrect.
:::
For instance, let's consider a question-answering model that provides answers based on a text chunk. To ensure that the model's response is firmly based on the given text chunk, we can employ a validator. In this case, we can use `ValidationInfo` to verify the response. By using a straightforward validator, we can guarantee that the model's response is firmly grounded in the provided text chunk.
```python
def citation_exists(v: str, info: ValidationInfo):
context = info.context
if context:
context = context.get("text_chunk")
if v not in context: # (1)!
raise ValueError(f"Citation `{v}` not found in text")
return v
Citation = Annotated[str, AfterValidator(citation_exists)]
class AnswerWithCitation(BaseModel):
answer: str
citation: Citation
```
Now lets consider an example where we want to answer a question using a text chunk. We can use a validator to ensure that the model's response is grounded in the provided text chunk.
```python
AnswerWithCitation.model_validate({
"answer": "The Capital of France is Paris",
"citation": "Paris is the capital."
}, context={"text_chunk": "please note that currently, paris now no longer is the capital of france."})
```
```
1 validation error for AnswerWithCitation
citation
Citation `Paris is the capital.` not found in text [type=value_error, input_value='Paris is the capital.', input_type=str]
Details at https://errors.pydantic.dev/2.4/v/value_error
```
Alhought the answer in this example was correct, the validator will raise an error because the citation is not in the text chunk. Which can help us identify and correct the model's 'hallucination' which can not be defined as incorrectly cited information.
We can use OpenAI to generate a response to a question using a text chunk. We can use a validator to ensure that the model's response is grounded in the provided text chunk.
```python
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=AnswerWithCitation,
messages=[
{"role": "user", "content": f"Answer the question `{q}` using the text chunk\n`{text_chunk}`"},
],
validation_context={"text_chunk": text_chunk},
)
```
By asking the language model to cite the text chunk, and subsequently verifying that the citation is in the text chunk, we can ensure that the model's response is grounded in the provided text chunk, minimizing hallucinations and giving us more confidence in the model's output.
## Conclusion
The power of these techniques lies in the flexibility and precision with which we can use Pydantic to describe and control outputs.
Whether it's moderating content, avoiding specific topics or competitors, or even ensuring responses are grounded in provided context, Pydantic's `BaseModel` offers a very natural way to describe the data structure we want, while validation functions and `ValidationInfo` provide the flexibility to enforce these constraints.
d:T3087,
In the last year, there's been a big leap in how we use advanced AI programs, especially in how we communicate with them to get specific tasks done. People are not just making chatbots; they're also using these AIs to sort information, improve their apps, and create synthetic data to train smaller task-specific models.
:::assert{title="What is Prompt Engineering?"}
Prompt Engineering is a technique to direct large language models (LLMs) like ChatGPT. It doesn't change the AI itself but tweaks how we ask questions or give instructions. This method improves the AI's responses, making them more accurate and helpful. It's like finding the best way to ask something to get the answer you need. There's a detailed article about it [here](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/).
:::
While some have resorted to [threatening human life](https://twitter.com/goodside/status/1657396491676164096?s=20) to generate structured data, we have found that Pydantic is even more effective.
In this post, we will discuss validating structured outputs from language models using Pydantic and OpenAI. We'll show you how to write reliable code. Additionally, we'll introduce a new library called [instructor](https://github.com/jxnl/instructor) that simplifies this process and offers extra features to leverage validation to improve the quality of your outputs.
## Pydantic
Unlike libraries like `dataclasses`, `Pydantic` goes a step further and defines a schema for your dataclass. This schema is used to validate data, but also to generate documentation and even to generate a JSON schema, which is perfect for our use case of generating structured data with language models!
:::note{title="Understanding Validation"}
A simple example of validation involves ensuring that a value has the correct type. For instance, let's consider a `Person` dataclass with a `name` field of type `str`. We can validate that the value is indeed a string.
```python
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
Person(name="Sam", age="10")
>>> Person(name="Sam", age="10")
```
By using the `dataclass` decorator, we can pass in the values as strings without any complaints from the dataclass. This would mean that we could run into issues later on if we try to use the `age` field as an `int`.
```python
Person(name="Sam", age="10").age + 1
>>> TypeError: can only concatenate str (not "int") to str
```
However, if we use `Pydantic`, we will obtain the correct type!
```python
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
Person(name="Sam", age="10")
>>> Person(name='Sam', age=10)
Person(name="Sam", age="10").age + 1
>>> 11
```
The `age` field has been updated to an `int` from a `str` in the demonstration. Pydantic validates and coerces the type, ensuring the correct type is obtained. If we provide data that cannot be converted to an `int`, an error will be returned.
```python
Person(name="Sam", age="13.4")
>>> ValidationError: 1 validation error for Person
>>> age
>>> Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='13.4', input_type=str]
>>> For further information visit https://errors.pydantic.dev/2.5/v/int_parsing
```
This behavior is great when we may not have trusted inputs, but is even more critical when inputs are coming from a language model!
To learn more about validation, check out the section [validation — a deliberate misnomer](https://docs.pydantic.dev/latest/concepts/models/#tldr)
:::
By providing the model with the following prompt, we can generate a JSON schema for a `PythonPackage` dataclass.
```python
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class PythonPackage(BaseModel):
name: str
author: str
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Return the `name`, and `author` of pydantic, in a json object."
},
]
)
Package.model_validate_json(resp.choices[0].message.content)
```
If everything is fine, we might receive an output similar to `json.loads({"name": "pydantic", "author": "Samuel Colvin"})`. However, if there is an issue, `resp.choices[0].message.content` could include text or code blocks in prose or markdown format that we need to handle appropriately.
**LLM responses with markdown code blocks**
````python
json.loads("""
```json
{
"name": "pydantic",
"author": "Samuel Colvin"
}
```
""")
>>> JSONDecodeError: Expecting value: line 1 column 1 (char 0
````
**LLM responses with prose**
```python
json.loads("""
Ok heres the authors of pydantic: Samuel Colvin, and the name this library
{
"name": "pydantic",
"author": "Samuel Colvin"
}
""")
>>> JSONDecodeError: Expecting value: line 1 column 1 (char 0
```
The content may contain valid JSON, but it isn't considered valid JSON without understanding the language model's behavior. However, it could still provide useful information that we need to handle independently. Fortunately, `OpenAI` offers several options to address this situation.
## Calling Tools
While tool-calling was originally designed to make calls to external APIs using JSON schema, its real value lies in allowing us to specify the desired output format. Fortunately, `Pydantic` provides utilities for generating a JSON schema and supports nested structures, which would be difficult to describe in plain text.
In this example, instead of describing the desired output in plain text, we simply provide the JSON schema for the `Packages` class, which includes a list of `Package` objects:
As an exercise, try prompting the model to generate this prompt without using Pydantic!
:::note{title="Example without Pydantic"}
Heres the same example as above without using pydantic's schema generation
```python
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Pydantic and FastAPI?",
},
],
tools=[
{
"type": "function",
"function": {
"name": "Requirements",
"description": "A list of packages and their first authors.",
"parameters": {
"$defs": {
"Package": {
"properties": {
"name": {"title": "Name", "type": "string"},
"author": {"title": "Author", "type": "string"},
},
"required": ["name", "author"],
"title": "Package",
"type": "object",
}
},
"properties": {
"packages": {
"items": {"$ref": "#/$defs/Package"},
"title": "Packages",
"type": "array",
}
},
"required": ["packages"],
"title": "Packages",
"type": "object",
},
},
}
],
tool_choice={
"type": "function",
"function": {"name": "Requirements"},
},
)
resp = json.loads(resp.choices[0].message.tool_calls[0].function.arguments)
```
:::
Now, notice in this example that the prompts we use contain purely the data we want, where the `tools` and `tool_choice` now capture the schemas we want to output. This separation of concerns makes it much easier to organize the 'data' and the 'description' of the data that we want back out.
```python
from typing import List
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class PythonPackage(BaseModel):
name: str
author: str
class Packages(BaseModel):
packages: List[PythonPackage]
resp = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Pydantic and FastAPI?",
},
],
tools=[
{
"type": "function",
"function": {
"name": "Requirements",
"description": "A list of packages and their first authors.",
"parameters": Packages.model_json_schema(),
},
}
],
tool_choice={
"type": "function",
"function": {"name": "Requirements"},
},
)
Packages.model_validate_json(
resp.choices[0].message.tool_calls[0].function.arguments
)
```
```json
{
"packages": [
{
"name": "pydantic",
"author": "Samuel Colvin"
},
{
"name": "fastapi",
"author": "Sebastián Ramírez"
}
]
}
```
## Using `pip install instructor`
The example we provided above is somewhat contrived, but it illustrates how Pydantic can be utilized to generate structured data from language models. Now, let's employ [Instructor](https://jxnl.github.io/instructor/) to streamline this process. Instructor is a compact library that enhances the OpenAI client by offering convenient features. In the upcoming blog post, we will delve into reasking and validation. However, for now, let's explore a practical example.
:::note{title="Not just OpenAI"}
While this post focuses on OpenAI, using Pydantic is not limited to OpenAI. You can use it for many language models that support the OpenAI API, including Mistral on [Anyscale](https://jxnl.github.io/instructor/blog/2023/12/15/patching/).
:::
```python
# pip install instructor
import instructor
client = instructor.patch(OpenAI())
packages = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Pydantic and FastAPI?",
},
],
response_model=Packages,
)
assert isinstance(resp, Packages)
assert isinstance(resp.packages, list)
assert isinstance(resp.packages[0], Package)
```
:::commend{title="Tips on Prompting with Pydantic"}
If you're looking for tips on how to design these kinds of models check out [prompting tips](https://jxnl.github.io/instructor/concepts/prompting/) from the Instructor documentation.
:::
## Case Study: Search query segmentation
Let's consider a practical example. Imagine we have a search engine capable of comprehending intricate queries. For instance, if we make a request to find "recent advancements in AI", we could provide the following payload:
```json
{
"rewritten_query": "novel developments advancements ai artificial intelligence machine learning",
"published_daterange": {
"start": "2023-09-17",
"end": "2021-06-17"
},
"domains_allow_list": ["arxiv.org"]
}
```
If we peek under the hood, we can see that the query is actually a complex object, with a date range, and a list of domains to search in. We can model this structured output in Pydantic using the instructor library
```python
from typing import List
import datetime
from pydantic import BaseModel
class DateRange(BaseModel):
start: datetime.date
end: datetime.date
class SearchQuery(BaseModel):
rewritten_query: str
published_daterange: DateRange
domains_allow_list: List[str]
async def execute():
# Return the search results of the rewritten query
return api.search(json=self.model_dump())
```
This pattern empowers us to restructure the user's query for improved performance, without requiring the user to understand the inner workings of the search backend.
```python
import instructor
from openai import OpenAI
# Enables response_model in the openai client
client = instructor.patch(OpenAI())
def search(query: str) -> SearchQuery:
return client.chat.completions.create(
model="gpt-4",
response_model=SearchQuery,
messages=[
{
"role": "system",
"content": f"You're a query understanding system for a search engine. Today's date is {datetime.date.today()}"
},
{
"role": "user",
"content": query
}
],
)
search("recent advancements in AI")
```
**Example Output**
```json
{
"rewritten_query": "novel developments advancements ai artificial intelligence machine learning",
"published_daterange": {
"start": "2023-12-15",
"end": "2023-01-01"
},
"domains_allow_list": ["arxiv.org"]
}
```
By defining the api payload as a Pydantic model, we can leverage the `response_model` argument to instruct the model to generate the desired output. This is a powerful feature that allows us to generate structured data from any language model!
In our upcoming posts, we will provide more practical examples and explore how we can leverage `Pydantic`'s validation features to ensure that the data we receive is not only valid syntactically but also semantically.
e:T502f,
Back in February I [announced](./company-announcement) Pydantic Inc., but I didn't explain what services we were building.
Today I want to provide a little more detail about what we're planning to build, and get your feedback on the components via a [short survey](#how-you-can-help-section).
In return for giving us your honest feedback, you have the option to be added to our early access list, to get invited to the closed beta of our platform once it's ready to use.
At the core of Pydantic's use is always data — Pydantic brings schema and sanity to your data.
The problem is that even with Pydantic in your corner, working with data when it leaves Python often still sucks.
We want to build a data platform to make working with data quick, easy, and enjoyable — where developer experience is our north star.
Before explaining what we're going to build, I should be explicit about what we're not building:
- We're not building a new database or querying engine
- We're not pretending that non-developers (or AI) can do the job of a developer — we believe in accelerating developers, not trying to replace them — we'll have CLIs before we have GUIs
- We're not doing 314 integrations with every conceivable technology
- Similarly, we're not going to have SDKs for every language — we'll build a few for the languages we know best, and provide a great API for the rest
## How you can help
There are five key components to the Pydantic Data Platform that we're thinking of building.
We want **your** feedback on these components — which you are most excited about, and which you wouldn't use.
We'll use your feedback to decide the order in which we build these features, and to help us build them in a way that works for you.
Here is a brief description of each of these component (each is explained in more detail below):
1. **Python Analytics/Observability** — a logging and metrics platform with tight Python and Pydantic integration, designed to make the data flowing through your application more readily usable for both engineering and business analytics. [More info...](#1-analyticsobservability-for-python-section)
2. **Data Gateway for object stores** — Add validation, transformation and cataloguing in front of object stores like S3, with a schema defined in Pydantic models then validated by our Rust service. [More info...](#2-data-gateway-for-object-stores-section)
3. **Data Gateway for data warehouses** — the same service as above, but integrated with your existing data warehouse. [More info...](#3-data-gateway-for-data-warehouses-section)
4. **Schema Catalog** — for many, Pydantic already holds the highest fidelity representation of their data schemas. Our Schema Catalog will take this to the next level, serving as an organization-wide single source of truth for those schemas, tracking their changes, and integrating with our other tools and your wider platform. [More info...](#4-schema-catalog-section)
5. **Dashboards and UI powered by Pydantic models** — a managed platform to deploy and control dashboards, auxiliary apps and internal tools where everything from UI components (like forms and tables) to database schema would be defined in Python using Pydantic models. [More info...](#5-dashboards-and-ui-powered-by-pydantic-models-section)
_Please complete a short survey to give us your feedback on these components, and to be added to our waiting list:_
## Component Deep Dive
Here is a little more detail on each of the features introduced above.
### 1. Analytics/Observability for Python
For many years observability/logging/analytics platforms have frustrated me for two reasons:
1. Logging (what exactly happened) and metrics (how often did it happen) are separate. I'm not satisfied with the existing tools for recording and viewing both together, in python or otherwise.
2. Observability (dev ops./developer insights) and business analytics are completely disconnected, although they're frequently powered by the same data.
I want all four of these views in one place, collected with the same tool.
- Why can't I collect and view information about recent sign-ups as easily as information about recent exceptions?
- Why can't logs of transactions give me a view of daily or monthly revenue?
#### Our Solution
We would give developers 3 tools:
1. An SDK to collect data in Python applications, with tight Pydantic integration
2. A dashboard to view that data, either in aggregate or for individual events, including the ability to build reports for other parts of the business
3. A lightweight Python ORM to query the data, to do whatever you like with it
We see use cases for this tool across many domains — from web applications and APIs where FastAPI is already widely used, to machine learning preparation and LLM validation, where the Pydantic package is already used by OpenAI, LangChain, HuggingFace, Prefect and others.
Our goal is to make it easy enough to integrate (think: setting one environment variable) that you'd install it in a 50-line script, but powerful enough to create monitoring dashboards and business intelligence reports for your core applications.
Here's how this might look:
###### Analytics — Direct use
```py
from pydantic_analytics import log # name TBD
async def transaction(payment: PaymentObject):
...
log("transaction-complete amount={payment.amount}", payment)
```
`PaymentObject` could be a Pydantic model, dataclass or typed dict. `transaction-complete` would uniquely identify this event, `amount` would be shown in the event summary and payment would be visible in the event details.
This would allow you to both view details of the transaction, and aggregate by amount.
#### Pydantic Integration
The data you want to collect and view is often already passing through Pydantic models, so we can build a service that integrates tightly with Pydantic to extract as much context as possible from the data with minimal developer effort.
###### Analytics — Pydantic Integration
```py
from datetime import datetime
from pydantic import BaseModel, EmailStr, SecretStr
class Signup(BaseModel, analytics="record-all"):
email: EmailStr
name: str
password: SecretStr
signup_ts: datetime
@app.post("/signup/")
def signup(signup_form: Signup):
# signups are recorded automatically upon validation
...
```
The idea is that you could instrument your application with no code changes, e.g. you could say "record all validations", or whitelist the models you want to record. In addition, fields of type `Secret*` field can be obfuscated automatically etc.
The `analytics` config key on models might have the following choices:
- `False` — don't record validations using this model
- `'record-validation-errors'` — record validation errors, but not the data
- `'record-all'` — record both validation errors and the data from successful validations
- `'record-metrics'` — record only the timestamp and status of validations
- omitted — use whatever default is set
#### Entities
We would allow you to define entities (they might be "Customers" or "Servers", "Background Tasks" or "Sales Prospects"), then update those entities as easily as you'd submit a new log message. As an example, you could imagine a `Customer` entity with a `last_login` field that is updated every time a `customer-login user_id=xxx` log message is received.
We'd also allow you to link between the entities using their existing unique IDs.
This would allow the Pydantic Data Platform to be used as an admin view of your application data as well as a logging or BI tool.
#### Logging data from other sources
While [opentelemetry](https://opentelemetry.io/) has its deficiencies, it should allow us to receive data from many other technologies without the need to build integrations for every one.
In addition, we will build a first class API (OpenAPI documentation, nice error messages, all the stuff that you've come to love from FastAPI) to make direct integrations and other language SDKs easy to develop.
---
### 2. Data Gateway for object stores
Add validation, transformation and cataloging in front of object stores like S3.
The idea is that we would bring the same declarative schema enforcement and cataloging that has made Pydantic and FastAPI so successful to other places. Putting a data validation and schema cataloging layer in front of data storage seems like a natural place for validation as a service.
Out of the box S3 is a key value store; you can't enforce that blobs under a certain path all have any specific schema, or even that they are all JSON files.
The S3 console organizes keys into a folder structure based on delimiters, but when you navigate to a "folder" you know nothing about the data without opening samples or reviewing the source code that produced them.
#### Our Solution
We will build a scalable, performant service that sits between clients and the data sink — a data validation reverse proxy or gateway.
Schemas would be defined via Pydantic models, but the service would provide a number of features you don't get from Pydantic alone:
- Many input formats would be supported (JSON, CSV, Excel, Parquet, Avro, msgpack, etc), with automatic conversion to the storage format
- Multiple storage formats would be supported — at least JSON and Parquet. Delta Lake tables and Iceberg tables might come later
- Multiple interfaces to upload and download data: HTTP API, S3 compliant API (so you can continue to use `aws s3 cp` etc.), Python SDK/ORM
- Arbitrary binary data formats (images, videos, documents) would be supported; validations would include checking formats, resolution of images and videos, etc.
- Over time we'll add features that take advantage of our knowledge of the schemas to improve costs/performance and the overall developer experience of S3.
- Optionally, successful and failed uploads could be logged to the logging/analytics service described [above](#1-analyticsobservability-for-python-section)
Because the validation and transformation would be implemented in Rust, and each process can perform validation for many different schemas, we will be able to provide this service at a significantly lower cost than running a Python service using Pydantic for validation.
#### Example
Let's say we want to upload data from user inquiries to a specific prefix, store it as JSON, this might look something like this:
###### Pydantic Gateway — JSON Dataset
```py
from datetime import datetime
from pydantic import BaseModel, EmailStr
from pydantic_gateway import JsonDataset # name TBD
class CustomerInquiry(BaseModel):
email: EmailStr
name: str
inquiry_ts: datetime
dataset = JsonDataset("my-bucket/inquiries/", CustomerInquiry)
dataset.register()
upload = dataset.create_upload_url(expiration=3_600)
print(upload.url)
#> https://gateway.pydantic.dev///?key=e3ca0f89etc
# validation directly from Python
# (note: validation would be run on the server, not locally)
dataset.upload({'email': 'jane.doe@example.com', ...})
```
Data about inquiries could then be added via another service, the equivalent of:
```sh
$ curl -X POST \
-d '{"email": "joe.bloggs@example.com"}' \
'https://gateway.pydantic.dev///?key=e3ca0f89etc'
```
Or using `awscli`, and this time uploading multiple inquires from a CSV file:
```sh
$ aws s3 cp \
--endpoint-url https://.s3.pydantic.dev \
inquiries.csv s3://
```
The data could then be read from Python:
```py
print(dataset.read(limit=100))
#> [CustomerInquiry(email='jane.doe@example.com', ...), ...]
```
The power here is that if the service submitting customer inquiries makes an error in the schema, the data is rejected or stored in a quarantine space for later review.
One of the most powerful tools that S3 provides is pre-signed URLs.
We'll provide support for pre-signed URLs and even expand support to creating upload endpoints for entire datasets.
That means you'll be able to generate a pre-authorized URL that still retains all the data validation.
```py
from pydantic_gateway import FileDataset, image_schema
profile_picture = image_schema(
output_width=100,
output_height=100,
output_max_size=1_000,
output_formats="png",
)
dataset = FileDataset(
"my-bucket/profile-pics/",
profile_picture
)
file_upload = dataset.create_file_upload_url(
"john-doe.jpg",
expiration=60
)
print(file_upload.url) # return the pre-signed URL to the client
#> https://gateway.pydantic.dev///upload?path=/users/1.jpg&key=e3ca0f89etc
# wait for a client to upload the data before updating the users current picture
await file_upload.completed()
file_contents = await file_upload.download()
...
```
---
### 3. Data Gateway for data warehouses
The components described [above](#2-data-gateway-for-object-stores-section) are useful well beyond object stores like S3.
We will also provide a similar service for data warehouses like Snowflake and BigQuery.
Basically, you give us a Pydantic Model and a table identifier, and we give you back an endpoint that you can POST to where, after validation, the data will be inserted as rows in the table.
```py
from pydantic_gateway import BigQueryTable
dataset = BigQueryTable(
"my-project.my-dataset.inquiries",
CustomerInquiry
)
upload = dataset.create_upload_url(
expiration=3_600,
)
print(upload.url)
#> https://gateway.pydantic.dev///upload?key=e3ca0f89etc
```
While there's value in providing validation on the front of a data warehouse, we know from talking to lots of teams about how they configure and operate their data warehouses and pipelines that this is only one of the challenges.
Longer term, we see significant value in providing a declarative way to define the transformations that should be applied to data as it moves _within_ the data warehouse.
---
### 4. Schema Catalog
One of the things we often hear when talking to engineers about how their organisation uses Pydantic, is that their highest-fidelity schemas across all their tools are often Pydantic models.
The problem is that, in contrast with the centralized nature of a relational database schema, these models are often scattered across multiple repositories. There's no easy way to find and use them, let alone keep track of changes in these schemas and how these changes might impact different parts of your organization.
Pydantic Schema Catalog would give you a single place to view schemas, including:
- The Pydantic model code
- Swagger/redoc documentation based on the JSON Schema — this provides a non-technical view of a schema to aid those who aren't (yet) comfortable reading Python
- Data sources which use this schema
- Links between this schema and other schemas, e.g. via foreign keys
- Changes to the schema over time, including whether the change is backwards compatible and what migration logic is required
- Together with the above components (Observability and Data Gateway), you could go straight from a schema to data stored with that schema, or vice versa
We will provide a web interface to view and manage schemas as well as a CLI to interact with the Schema Catalog.
###### Schema Catalog — Example
```py
from datetime import datetime
from pydantic import BaseModel, EmailStr
class CustomerInquiry(BaseModel):
email: EmailStr
name: str
inquiry_ts: datetime
import pydantic_schema_catalogue # name TBD
pydantic_schema_catalogue.register(CustomerInquiry)
```
Later in another project...
```sh
$ pydantic-schema-catalogue list
...
# download the schema `User` as a Pydantic model
$ pydantic-schema-catalogue get --format pydantic User > user_model.py
# download the schema `User` as a Postgres SQL to create a table
$ pydantic-schema-catalogue get --format postgres User >> tables.sql
```
The Schema Catalog would integrate closely with other components described above:
- schemas of models logged could be automatically registered in the Schema Catalog
- a schema in the Schema Catalog would be used to create a validation endpoint with one or two clicks or a CLI command
#### Schema Inference
All too often, you have data without a schema, and reverse engineering a comprehensive schema is a painful, manual process.
Pydantic Schema Catalog would provide a way to infer a schema from a dataset, allowing you to initialize a new schema from a sample of data.
```sh
$ pydantic-schema-catalogue infer --name 'Customer Inquiry' inquiries.csv
```
---
### 5. Dashboards and UI powered by Pydantic models
One of the major goals of data collection is to derive insights from and make decisions based on the collected data.
But often the person responsible for those insights or decisions is outside your engineering team, sometimes outside your organization altogether.
So a flawlessly executing data pipeline populating your data warehouse isn't enough. You need a way to help the rest of your organization visualize and interact with the data.
But since this data visualization is often not your core business, you don't want to spend a week or month(!) building a dashboard, or maintain and extend it going forward.
Pydantic Dashboards will allow you to build UIs powered by Pydantic models, and python code, in minutes. We would take care of the hosting, scaling, and maintenance, as well as enforcing authentication.
Pydantic Dashboards would provide all the common UI components (tables, pagination, forms, charts) and use the ORM we build on top of the above components to provide a simple, but powerful, way to interact with your data.
Below is an example of how this might look, taking the "Customer Inquiries" example from above.
###### Pydantic Dashboard — Customer Inquiries
```py
from datetime import datetime
from typing import Literal
from pydantic_dash import app, components
from pydantic_dash.auth import GoogleAuth
from pydantic_dash.responses import Redirect
from pydantic_db import DatabaseModel, DayAgg
from pydantic import EmailStr
app.auth(GoogleAuth(domain='my-company.com'))
class CustomerInquiry(DatabaseModel):
email: EmailStr
name: str
inquiry_ts: datetime
source: Literal['website', 'email', 'phone']
@app.view('/', buttons={
'/new/': 'Create New',
'/list/': 'View All'
})
def inquiries_analytics():
# query the database for recent inquiries
recent_inquiries = (
CustomerInquiry.db_all()
.order_by('inquiry_ts', desc=True)
.limit(1_000)
)
# return two components: a pie chart and a bar chart
return components.Row(
# charts are designed alongside the ORM,
# to use sensible defaults, e.g. here the charts
# infer the `.count()` implicit in the `groupby`
components.PieChart(
recent_inquiries.groupby('source'),
title='Inquiry Sources',
),
components.BarChart(
recent_inquiries.groupby(DayAgg('inquiry_ts')),
x_label='Date',
y_label='Inquiry Count',
),
),
# using a list_view here means the query returned is automatically
# rendered as a table and paginated
@app.list_view('/list/', buttons={'/new/': 'Create New'})
def inquiries_list():
return CustomerInquiry.db_all().order_by('inquiry_ts', desc=True)
# form_view provides both GET and POST form endpoints
# the GET view renders an HTML form based on the `CustomerInquiry` model
# the POST view validates the request with the `CustomerInquiry` model,
# then calls the function
@app.form_view('/new/', CustomerInquiry)
def new_inquiry(inquiry: CustomerInquiry):
inq_id = inquiry.db_save()
return Redirect(f'/{inq_id}/')
```
This tool is not intended to replace the UI in your organization's main products, but there are many places in companies big and small where a managed tool like this with batteries included could cut the time required to build a dashboard or simple app from days to minutes.
---
**Thanks for reading, we would really appreciate your feedback on these ideas! Please complete [the survey](https://forms.gle/cMbLXNUxdgG2DeSv6) or email [hello@pydantic.dev](mailto:hello@pydantic.dev) with your thoughts.**
f:T2d91,
We're excited to announce the first alpha release of Pydantic V2!
This first Pydantic V2 alpha is no April Fool's joke — for a start we missed our April 1st target date :cry:.
After a year's work, we invite you to explore the improvements we've made and give us your feedback.
We look forward to hearing your thoughts and working together to improve the library.
For many of you, Pydantic is already a key part of your Python toolkit and needs no introduction —
we hope you'll find the improvements and additions in Pydantic V2 useful.
If you're new to Pydantic: Pydantic is an open-source Python library that provides powerful data parsing and validation —
including type coercion and useful error messages when typing issues arise — and settings management capabilities.
See [the docs](https://docs.pydantic.dev/latest/) for examples of Pydantic at work.
## Getting started with the Pydantic V2 alpha
Your feedback will be a critical part of ensuring that we have made the right tradeoffs with the API changes in V2.
To get started with the Pydantic V2 alpha, install it from PyPI.
We recommend using a virtual environment to isolate your testing environment:
```bash
pip install --pre -U "pydantic>=2.0a1"
```
Note that there are still some rough edges and incomplete features, and while trying out the Pydantic V2 alpha releases you may experience errors.
We encourage you to try out the alpha releases in a test environment and not in production.
Some features are still in development, and we will continue to make changes to the API.
If you do encounter any issues, please [create an issue in GitHub](https://github.com/pydantic/pydantic/issues) using the `bug V2` label.
This will help us to actively monitor and track errors, and to continue to improve the library’s performance.
This will be the first of several upcoming alpha releases. As you evaluate our changes and enhancements,
we encourage you to share your feedback with us.
Please let us know:
- If you don't like the changes, so we can make sure Pydantic remains a library you enjoy using.
- If this breaks your usage of Pydantic so we can fix it, or at least describe a migration path.
Thank you for your support, and we look forward to your feedback.
---
## Headlines
Here are some of the most interesting new features in the current Pydantic V2 alpha release.
For background on plans behind these features, see the earlier [Pydantic V2 Plan](./pydantic-v2) blog post.
The biggest change to Pydantic V2 is [`pydantic-core`](https://github.com/pydantic/pydantic-core) —
all validation logic has been rewritten in Rust and moved to a separate package, `pydantic-core`.
This has a number of big advantages:
- **Performance** - Pydantic V2 is 5-50x faster than Pydantic V1.
- **Safety & maintainability** - We've made changes to the architecture that we think will help us maintain Pydantic V2 with far fewer bugs in the long term.
With the use of `pydantic-core`, the majority of the logic in the Pydantic library is dedicated to generating
"pydantic core schema" — the schema used define the behaviour of the new, high-performance `pydantic-core` validators and serializers.
### Ready for experimentation
- **BaseModel** - the core of validation in Pydantic V1 remains, albeit with new method names.
- **Dataclasses** - Pydantic dataclasses are improved and ready to test.
- **Serialization** - dumping/serialization/marshalling is significantly more flexible, and ready to test.
- **Strict mode** - one of the biggest additions in Pydantic V2 is strict mode, which is ready to test.
- **JSON Schema** - generation of JSON Schema is much improved and ready to test.
- **Generic Models** - are much improved and ready to test.
- **Recursive Models** - and validation of recursive data structures is much improved and ready to test.
- **Custom Types** - custom types have a new interface and are ready to test.
- **Custom Field Modifiers** - used via `Annotated[]` are working and in use in Pydantic itself.
- **Validation without a BaseModel** - the new `TypeAdapter` class allows validation without the need for a `BaseModel` class, and it's ready to test.
- **TypedDict** - we now have full support for `TypedDict` via `TypeAdapter`, it's ready to test.
### Still under construction
- **Documentation** - we're working hard on full documentation for V2, but it's not ready yet.
- **Conversion Table** - a big addition to the documentation will be a conversion table showing how types are coerced, this is a WIP.
- **BaseSettings** - `BaseSettings` will move to a separate `pydantic-settings` package, it's not yet ready to test.
**Notice:** since `pydantic-settings` is not yet ready to release, there's no support for `BaseSettings` in the first alpha release.
- **validate_arguments** - the `validate_arguments` decorator remains and is working, but hasn't been updated yet.
- **Hypothesis Plugin** - the Hypothesis plugin is yet to be updated.
- **computed fields** - we know a lot of people are waiting for this, we will include it in Pydantic V2.
- **Error messages** - could use some love, and links to docs in error messages are still to be added.
- **Migration Guide** - we have some pointers below, but this needs completing.
## Migration Guide
**Please note:** this is just the beginning of a migration guide. We'll work hard up to the final release to prepare
a full migration guide, but for now the following pointers should be some help while experimenting with V2.
### Changes to BaseModel
- Various method names have been changed; `BaseModel` methods all start with `model_` now.
Where possible, we have retained the old method names to help ease migration, but calling them will result in `DeprecationWarning`s.
- Some of the built-in data loading functionality has been slated for removal.
In particular, `parse_raw` and `parse_file` are now deprecated. You should load the data and then pass it to `model_validate`.
- The `from_orm` method has been removed; you can now just use `model_validate` (equivalent to `parse_obj` from Pydantic V1) to achieve something similar,
as long as you've set `from_attributes=True` in the model config.
- The `__eq__` method has changed for models; models are no longer considered equal to the dicts.
- Custom `__init__` overrides won't be called. This should be replaced with a `@root_validator`.
- Due to inconsistency with the rest of the library, we have removed the special behavior of models
using the `__root__` field, and have disallowed the use of an attribute with this name to prevent confusion.
However, you can achieve equivalent behavior with a "standard" field name through the use of `@root_validator`,
`@model_serializer`, and `__pydantic_modify_json_schema__`. You can see an example of this
[here](https://github.com/pydantic/pydantic/blob/2b9459f20d094a46fa3093b43c34444240f03646/tests/test_parse.py#L95-L113).
### Changes to Pydantic Dataclasses
- The `__post_init__` in Pydantic dataclasses will now be called after validation, rather than before.
- We no longer support `extra='allow'` for Pydantic dataclasses, where extra attributes passed to the initializer would be
stored as extra fields on the dataclass. `extra='ignore'` is still supported for the purposes of allowing extra fields while parsing data; they just aren't stored.
- `__post_init_post_parse__` has been removed.
- Nested dataclasses no longer accept tuples as input, only dict.
### Changes to Config
- To specify config on a model, it is now deprecated to create a class called `Config` in the namespace of the parent `BaseModel` subclass.
Instead, you just need to set a class attribute called `model_config` to be a dict with the key/value pairs you want to be used as the config.
The following config settings have been removed:
- `allow_mutation` — this has been removed. You should be able to use [frozen](https://docs.pydantic.dev/latest/#pydantic.config.ConfigDict) equivalently (inverse of current use).
- `error_msg_templates`.
- `fields` — this was the source of various bugs, so has been removed. You should be able to use `Annotated` on fields to modify them as desired.
- `getter_dict` — `orm_mode` has been removed, and this implementation detail is no longer necessary.
- `schema_extra` — you should now use the `json_schema_extra` keyword argument to `pydantic.Field`.
- `smart_union`.
- `underscore_attrs_are_private` — the Pydantic V2 behavior is now the same as if this was always set to `True` in Pydantic V1.
The following config settings have been renamed:
- `allow_population_by_field_name` → `populate_by_name`
- `anystr_lower` → `str_to_lower`
- `anystr_strip_whitespace` → `str_strip_whitespace`
- `anystr_upper` → `str_to_upper`
- `keep_untouched` → `ignored_types`
- `max_anystr_length` → `str_max_length`
- `min_anystr_length` → `str_min_length`
- `orm_mode` → `from_attributes`
- `validate_all` → `validate_default`
### Changes to Validators
- Raising a `TypeError` inside a validator no longer produces a `ValidationError`, but just raises the `TypeError` directly.
This was necessary to prevent certain common bugs (such as calling functions with invalid signatures) from
being unintentionally converted into `ValidationError` and displayed to users.
If you really want `TypeError` to be converted to a `ValidationError` you should use a `try: except:` block that will catch it and do the conversion.
- `each_item` validators are deprecated and should be replaced with a type annotation using `Annotated` to apply a validator
or with a validator that operates on all items at the top level.
- Changes to `@validator`-decorated function signatures.
- The `stricturl` type has been removed.
- Root validators can no longer be run with `skip_on_failure=False`.
### Changes to Validation of specific types
- Integers outside the valid range of 64 bit integers will cause `ValidationError`s during parsing.
To work around this, use an `IsInstance` validator (more details to come).
- Subclasses of built-ins won't validate into their subclass types; you'll need to use an `IsInstance` validator to validate these types.
### Changes to Generic models
- While it does not raise an error at runtime yet, subclass checks for parametrized generics should no longer be used.
These will result in `TypeError`s and we can't promise they will work forever. However, it will be okay to do subclass checks against _non-parametrized_ generic models
### Other changes
- `GetterDict` has been removed, as it was just an implementation detail for `orm_mode`, which has been removed.
### TypeAdapter
Pydantic V1 didn't have good support for validation or serializing non-`BaseModel`.
To work with them you had to create a "root" model or use the utility functions in `pydantic.tools` (`parse_obj_as` and `schema_of`).
In Pydantic V2 this is _a lot_ easier: the `TypeAdapter` class lets you build an object that behaves almost like a `BaseModel` class which you can use for a lot of the use cases of root models and as a complete replacement for `parse_obj_as` and `schema_of`.
```python
from typing import List
from pydantic import TypeAdapter
validator = TypeAdapter(List[int])
assert validator.validate_python(['1', '2', '3']) == [1, 2, 3]
print(validator.json_schema())
#> {'items': {'type': 'integer'}, 'type': 'array'}
```
Note that this API is provisional and may change before the final release of Pydantic V2.
10:T295c,
---
> **Update**: If you find this article interesting, you might also like the
> [TechCrunch article](https://techcrunch.com/2023/02/16/sequoia-backs-open-source-data-validation-framework-pydantic-to-commercialize-with-cloud-services/)
> and [Sequoia blog post](https://www.sequoiacap.com/article/partnering-with-pydantic-no-more-steel-seats-for-developers/) about Pydantic.
---
I've decided to start a company based on the principles that I believe have led to Pydantic's success.
I have closed a seed investment round led by [Sequoia](https://www.sequoiacap.com), with participation from
[Partech](https://partechpartners.com/), [Irregular Expressions](https://irregex.vc) and some amazing angel investors
including [Bryan Helmig](https://www.linkedin.com/in/bryanhelmig/) (co-founder and CTO of Zapier),
[Tristan Handy](https://www.linkedin.com/in/tristanhandy/) (founder and CEO of Dbt Labs) and
[David Cramer ](https://www.linkedin.com/in/dmcramer/)(co-founder and CTO of Sentry).
## Why?
I've watched with fascination as Pydantic has grown to become the most widely used Python data validation library, with over 40m downloads a month.
Pydantic Downloads from PyPI vs. Django
![Pydantic Downloads from PyPI vs. Django](/assets/blog/company-announcement/pydantic-vs-django-downloads.png)
By my rough estimate, Pydantic is used by 12% of professional web developers! [†](#3-quot12-of-professional-web-developersquot-claim)
But Pydantic wasn't the first (or last) such library. Why has it been so successful?
I believe it comes down to two things:
1. We've always made developer experience the first priority.
2. We've leveraged technologies which developers already understand — most notably, Python type annotations.
In short, we've made Pydantic easy to get started with, and easy to do powerful things with.
I believe the time is right to apply those principles to other, bigger challenges.
"The Cloud" is still relatively new (think about what cars or tractors looked like 15 years after their conception), and
while it has already transformed our lives, it has massive shortcomings. I think we're uniquely positioned to address
some of these shortcomings.
We'll start by transforming the way those 12% of web developers who already know and trust Pydantic use cloud services
to build and deploy web applications. Then, we'll move on to help the other 88%!
### Cloud services suck (at least for us developers)
Picture the driving position of a 1950s tractor — steel seat, no cab, knobs sticking out of the engine compartment near
the component they control, hot surfaces just waiting for you to lean on them. Conceptually, this isn't surprising — the
tractor was a tool to speed up farming; its driver was an afterthought, and as long as they could manage to operate it
there was no value in making the experience pleasant or comfortable.
![1950s tractor](/assets/blog/company-announcement/tractor.svg)
Today's cloud services have the look and feel of that tractor. They're conceived by infrastructure people who care about
efficient computation, fast networking, and cheap storage. The comfort and convenience of the developers who need to
drive these services to build end-user facing applications has been an afterthought.
Both the tractor and the cloud service of the past made sense: The majority of people who made the purchasing decisions
didn't operate them, and those who did had little influence. Why bother making them nice to operate?
> _"At least it's not a ~~cart horse~~ Windows box in the corner — quit complaining!"_
Just as the experience of driving tractors transformed as their drivers' pay and influence increased, so cloud services
are going through a transformation as their operators' pay and influence increases significantly.
There are many examples now of services and tools that are winning against incumbents because of great developer
experience:
- [Stripe](https://stripe.com) is winning in payments despite massive ecosystem of incumbents
- [Sentry](https://sentry.io/welcome/) is winning in application monitoring, even though you can send, store, and view the same data in CloudWatch et al. more cheaply
- [Vercel](https://vercel.com/) is winning in application hosting by focusing on one framework — Next.js
- [Python](https://www.tiobe.com/tiobe-index/) is winning against other programming languages, even though it's not backed by a massive corporation
- [Pydantic](https://docs.pydantic.dev/) is winning in data validation for Python, even though it's far from the first such library
In each case the developer experience is markedly better than what came before, and developers have driven adoption.
There is a massive opportunity to create cloud services with great developer experience at their heart.
I think we're well positioned to be part of it.
### Developers are still drowning under the weight of duplication
The story of the cloud has been about reducing duplication, abstracting away infrastructure and boilerplate:
co-location facilities with servers, cages and wires gave way to VMs. VMs gave way to PaaS offerings where you
just provide your application code. Serverless is challenging PaaS by offering to remove scaling worries.
At each step, cloud providers took work off engineers which was common to many customers.
But this hasn't gone far enough. Think about the last web application you worked on —
how many of the views or components were unique to your app?
Sure, you fitted them together in a unique way, but many (20%, 50%, maybe even 80%?) will exist hundreds or thousands of
times in other code bases. Couldn't many of those components, views, and utilities be shared with other apps without
affecting the value of your application? Again, reducing duplication, and reducing the time and cost of building an
application.
At the same time, serverless, despite being the trendiest way to deploy applications for the last few years, has made
much of this worse — complete web frameworks have often been switched out for bare-bones entry points which lack
even the most basic functionality of a web framework like routing, error handling, or database integration.
What if we could build a platform with the best of all worlds? Taking the final step in reducing the boilerplate
and busy-work of building web applications — allowing developers to write nothing more than the core logic which
makes their application unique and valuable?
And all with a developer experience to die for.
## What, specifically, are we building?
I'm not sharing details yet :).
The immediate plan is to hire the brightest developers I can find and work with them to refine our vision and exactly
what we're building while we finish and release Pydantic V2.
While I have some blueprints in my head of the libraries and services we want to build, we have a lot of options
for exactly where to go; we won't constrain what we can design by making any commitments now.
**If you're interested in what we're doing, hit subscribe on
[this GitHub issue](https://github.com/pydantic/pydantic/issues/5063).
We'll comment there when we have
more concrete information.**
## The plan
The plan, in short, is this:
1. Hire the best developers from around the world (see "We're Hiring" below)
2. Finish and release Pydantic V2, and continue to maintain, support and develop Pydantic over the years
3. Build cloud services and developer tools that developers love
Pydantic, the open source project, will be a cornerstone of the company. It'll be a key technical component
in what we're building and an important asset to help convince developers that the commercial tools and services
we build will be worth adopting. It will remain open source and MIT licenced, and support and development will
accelerate.
I'm currently working full time on Pydantic V2 (learn more from the [previous blog post](https://docs.pydantic.dev/blog/pydantic-v2/)).
It should be released later this year, hopefully in Q1. V2 is a massive advance for Pydantic — the core has been
re-written in Rust, making Pydantic V2 around 17x faster than V1. But there are lots of other goodies: strict mode,
composable validators, validation context, and more. I can't wait to get Pydantic V2 released and see how the community
uses it.
We'll keep working closely with [other open source libraries](https://github.com/topics/pydantic) that use and depend
on Pydantic as we have up to this point, making sure the whole Pydantic ecosystem continues to thrive.
_On a side note: Now that I'm paid to work on Pydantic, I'll be sharing all future open source sponsorship among
other open source projects we rely on._
## We ~~are~~ were hiring
We've had an extraordinary response to this announcement, and have hired an extremely talented team of developers.
We therefore aren't actively hiring at this time. Please follow us on Twitter/LinkedIn/Mastodon to hear about future
opportunities.
~~If you're a senior Python or full stack developer and think the ideas above are exciting, we'd love to hear from you.
Please email [careers@pydantic.dev](mailto:careers@pydantic.dev) with a brief summary of your skills and experience,
including links to your GitHub profile and your CV.~~
## Appendix
### "12% of professional web developers" claim
At first glance this seems like a fairly incredible number, where does it come from?
According to the
[StackOverflow developer survey 2022](https://survey.stackoverflow.co/2022/#section-most-popular-technologies-web-frameworks-and-technologies),
FastAPI is used by 6.01% of professional developers.
According to my survey of Pydantic users, Pydantic's usage is split roughly into:
- 25% FastAPI
- 25% other web development
- 50% everything else
That matches the numbers from PyPI downloads, which shows that (as of 2023-01-31)
[Pydantic has 46m](https://pepy.tech/project/pydantic) downloads in the last 30 days, while
[FastAPI has 10.7m](https://pepy.tech/project/fastapi) — roughly 25%.
Based on these numbers, I estimate Pydantic is used for web development by about twice the number who use it through
FastAPI — roughly 12%.
11:Tcb76,
Updated late 10 Jul 2022, see [pydantic#4226](https://github.com/pydantic/pydantic/pull/4226).
---
I've spoken to quite a few people about pydantic V2, and mention it in passing even more.
I owe people a proper explanation of the plan for V2:
- What we will add
- What we will remove
- What we will change
- How I'm intending to go about completing it and getting it released
- Some idea of timeframe :fearful:
Here goes...
---
Enormous thanks to
[Eric Jolibois](https://github.com/PrettyWood), [Laurence Watson](https://github.com/Rabscuttler),
[Sebastián Ramírez](https://github.com/tiangolo), [Adrian Garcia Badaracco](https://github.com/adriangb),
[Tom Hamilton Stubber](https://github.com/tomhamiltonstubber), [Zac Hatfield-Dodds](https://github.com/Zac-HD),
[Tom](https://github.com/czotomo) & [Hasan Ramezani](https://github.com/hramezani)
for reviewing this blog post, putting up with (and correcting) my horrible typos and making great suggestions
that have made this post and Pydantic V2 materially better.
---
## Plan & Timeframe
I'm currently taking a kind of sabbatical after leaving my last job to get pydantic V2 released.
Why? I ask myself that question quite often.
I'm very proud of how much pydantic is used, but I'm less proud of its internals.
Since it's something people seem to care about and use quite a lot
(26m downloads a month, used by 72k public repos, 10k stars).
I want it to be as good as possible.
While I'm on the subject of why, how and my odd sabbatical: if you work for a large company who use pydantic a lot,
you might encourage the company to **sponsor me a meaningful amount**,
like [Salesforce did](https://twitter.com/samuel_colvin/status/1501288247670063104)
(if your organisation is not open to donations, I can also offer consulting services).
This is not charity, recruitment or marketing - the argument should be about how much the company will save if
pydantic is 10x faster, more stable and more powerful - it would be worth paying me 10% of that to make it happen.
Before pydantic V2 can be released, we need to release pydantic V1.10 - there are lots of changes in the main
branch of pydantic contributed by the community, it's only fair to provide a release including those changes,
many of them will remain unchanged for V2, the rest will act as a requirement to make sure pydantic V2 includes
the capabilities they implemented.
The basic road map for me is as follows:
1. Implement a few more features in pydantic-core, and release a first version, see [below](#motivation--pydantic-core)
2. Work on getting pydantic V1.10 out - basically merge all open PRs that are finished
3. Release pydantic V1.10
4. Delete all stale PRs which didn't make it into V1.10, apologise profusely to their authors who put their valuable
time into pydantic only to have their PRs closed :pray:
(and explain when and how they can rebase and recreate the PR)
5. Rename `master` to `main`, seems like a good time to do this
6. Change the main branch of pydantic to target V2
7. Start tearing pydantic code apart and see how many existing tests can be made to pass
8. Rinse, repeat
9. Release pydantic V2 :tada:
Plan is to have all this done by the end of October, definitely by the end of the year.
### Breaking Changes & Compatibility :pray:
While we'll do our best to avoid breaking changes, some things will break.
As per the [greatest pun in modern TV history](https://youtu.be/ezAlySFluEk).
> You can't make a Tomelette without breaking some Greggs.
Where possible, if breaking changes are unavoidable, we'll try to provide warnings or errors to make sure those
changes are obvious to developers.
## Motivation & `pydantic-core`
Since pydantic's initial release, with the help of wonderful contributors
[Eric Jolibois](https://github.com/PrettyWood),
[Sebastián Ramírez](https://github.com/tiangolo),
[David Montague](https://github.com/dmontagu) and many others, the package and its usage have grown enormously.
The core logic however has remained mostly unchanged since the initial experiment.
It's old, it smells, it needs to be rebuilt.
The release of version 2 is an opportunity to rebuild pydantic and correct many things that don't make sense -
**to make pydantic amazing :rocket:**.
The core validation logic of pydantic V2 will be performed by a separate package
[pydantic-core](https://github.com/pydantic/pydantic-core) which I've been building over the last few months.
_pydantic-core_ is written in Rust using the excellent [pyo3](https://pyo3.rs) library which provides rust bindings
for python.
The motivation for building pydantic-core in Rust is as follows:
1. **Performance**, see [below](#performance--section)
2. **Recursion and code separation** - with no stack and little-to-no overhead for extra function calls,
Rust allows pydantic-core to be implemented as a tree of small validators which call each other,
making code easier to understand and extend without harming performance
3. **Safety and complexity** - pydantic-core is a fairly complex piece of code which has to draw distinctions
between many different errors, Rust is great in situations like this,
it should minimise bugs (:fingers_crossed:) and allow the codebase to be extended for a long time to come
:::note
The python interface to pydantic shouldn't change as a result of using pydantic-core, instead
pydantic will use type annotations to build a schema for pydantic-core to use.
:::
pydantic-core is usable now, albeit with an unintuitive API, if you're interested, please give it a try.
pydantic-core provides validators for common data types,
[see a list here](https://github.com/pydantic/pydantic-core/blob/main/pydantic_core/schema_types.py#L314).
Other, less commonly used data types will be supported via validator functions implemented in pydantic, in Python.
See [pydantic-core#153](https://github.com/pydantic/pydantic-core/issues/153)
for a summary of what needs to be completed before its first release.
## Headlines
Here are some of the biggest changes expected in V2.
### Performance :+1:
As a result of the move to Rust for the validation logic
(and significant improvements in how validation objects are structured) pydantic V2 will be significantly faster
than pydantic V1.
Looking at the pydantic-core [benchmarks](https://github.com/pydantic/pydantic-core/tree/main/tests/benchmarks)
today, pydantic V2 is between 4x and 50x faster than pydantic V1.9.1.
In general, pydantic V2 is about 17x faster than V1 when validating a model containing a range of common fields.
### Strict Mode :+1:
People have long complained about pydantic for coercing data instead of throwing an error.
E.g. input to an `int` field could be `123` or the string `"123"` which would be converted to `123`
While this is very useful in many scenarios (think: URL parameters, environment variables, user input),
there are some situations where it's not desirable.
pydantic-core comes with "strict mode" built in. With this, only the exact data type is allowed, e.g. passing
`"123"` to an `int` field would result in a validation error.
This will allow pydantic V2 to offer a `strict` switch which can be set on either a model or a field.
### Formalised Conversion Table :+1:
As well as complaints about coercion, another legitimate complaint was inconsistency around data conversion.
In pydantic V2, the following principle will govern when data should be converted in "lax mode" (`strict=False`):
> If the input data has a SINGLE and INTUITIVE representation, in the field's type, AND no data is lost
> during the conversion, then the data will be converted; otherwise a validation error is raised.
> There is one exception to this rule: string fields -
> virtually all data has an intuitive representation as a string (e.g. `repr()` and `str()`), therefore
> a custom rule is required: only `str`, `bytes` and `bytearray` are valid as inputs to string fields.
Some examples of what that means in practice:
| Field Type | Input | Single & Intuitive R. | All Data Preserved | Result |
| ---------- | ----------------------- | --------------------- | ------------------ | ------- |
| `int` | `"123"` | :material-check: | :material-check: | Convert |
| `int` | `123.0` | :material-check: | :material-check: | Convert |
| `int` | `123.1` | :material-check: | :material-close: | Error |
| `date` | `"2020-01-01"` | :material-check: | :material-check: | Convert |
| `date` | `"2020-01-01T00:00:00"` | :material-check: | :material-check: | Convert |
| `date` | `"2020-01-01T12:00:00"` | :material-check: | :material-close: | Error |
| `int` | `b"1"` | :material-close: | :material-check: | Error |
(For the last case converting `bytes` to an `int` could reasonably mean `int(bytes_data.decode())` or
`int.from_bytes(b'1', 'big/little')`, hence an error)
In addition to the general rule, we'll provide a conversion table which defines exactly what data will be allowed
to which field types. See [the table below](#formalised-conversion-table--section) for a start on this.
### Built in JSON support :+1:
pydantic-core can parse JSON directly into a model or output type, this both improves performance and avoids
issue with strictness - e.g. if you have a strict model with a `datetime` field, the input must be a
`datetime` object, but clearly that makes no sense when parsing JSON which has no `datatime` type.
Same with `bytes` and many other types.
Pydantic V2 will therefore allow some conversion when validating JSON directly, even in strict mode
(e.g. `ISO8601 string -> datetime`, `str -> bytes`) even though this would not be allowed when validating
a python object.
In future direct validation of JSON will also allow:
- parsing in a separate thread while starting validation in the main thread
- line numbers from JSON to be included in the validation errors
(These features will not be included in V2, but instead will hopefully be added later.)
:::note
Pydantic has always had special support for JSON, that is not going to change.
:::
While in theory other formats could be specifically supported, the overheads and development time are significant and I don't think there's another format that's used widely enough to be worth specific logic. Other formats can be parsed to python then validated,similarly when serializing, data can be exported to a python object, then serialized, see [below](#improvements-to-dumpingserializationexport---section).
### Validation without a Model :+1:
In pydantic V1 the core of all validation was a pydantic model, this led to a significant performance penalty
and extra complexity when the output data type was not a model.
pydantic-core operates on a tree of validators with no "model" type required at the base of that tree.
It can therefore validate a single `string` or `datetime` value, a `TypedDict` or a `Model` equally easily.
This feature will provide significant addition performance improvements in scenarios like:
- Adding validation to `dataclasses`
- Validating URL arguments, query strings, headers, etc. in FastAPI
- Adding validation to `TypedDict`
- Function argument validation
- Adding validation to your custom classes, decorators...
In effect - anywhere where you don't care about a traditional model class instance.
We'll need to add standalone methods for generating JSON Schema and dumping these objects to JSON, etc.
### Required vs. Nullable Cleanup :+1:
Pydantic previously had a somewhat confused idea about "required" vs. "nullable". This mostly resulted from
my misgivings about marking a field as `Optional[int]` but requiring a value to be provided but allowing it to be
`None` - I didn't like using the word "optional" in relation to a field which was not optional.
In pydantic V2, pydantic will move to match dataclasses, thus:
```py title="Required vs. Nullable" test="skip" lint="skip" upgrade="skip"
from pydantic import BaseModel
class Foo(BaseModel):
f1: str # required, cannot be None
f2: str | None # required, can be None - same as Optional[str] / Union[str, None]
f3: str | None = None # not required, can be None
f4: str = 'Foobar' # not required, but cannot be None
```
### Validator Function Improvements :+1: :+1: :+1:
This is one of the changes in pydantic V2 that I'm most excited about, I've been talking about something
like this for a long time, see [pydantic#1984](https://github.com/pydantic/pydantic/issues/1984), but couldn't
find a way to do this until now.
Fields which use a function for validation can be any of the following types:
- **function before mode** - where the function is called before the inner validator is called
- **function after mode** - where the function is called after the inner validator is called
- **plain mode** - where there's no inner validator
- **wrap mode** - where the function takes a reference to a function which calls the inner validator,
and can therefore modify the input before inner validation, modify the output after inner validation, conditionally
not call the inner validator or catch errors from the inner validator and return a default value, or change the error
An example how a wrap validator might look:
```py title="Wrap mode validator function" test="skip" lint="skip" upgrade="skip"
from datetime import datetime
from pydantic import BaseModel, ValidationError, validator
class MyModel(BaseModel):
timestamp: datetime
@validator('timestamp', mode='wrap')
def validate_timestamp(cls, v, handler):
if v == 'now':
# we don't want to bother with further validation,
# just return the new value
return datetime.now()
try:
return handler(v)
except ValidationError:
# validation failed, in this case we want to
# return a default value
return datetime(2000, 1, 1)
```
As well as being powerful, this provides a great "escape hatch" when pydantic validation doesn't do what you need.
### More powerful alias(es) :+1:
pydantic-core can support alias "paths" as well as simple string aliases to flatten data as it's validated.
Best demonstrated with an example:
```py title="Alias paths" test="skip" lint="skip" upgrade="skip"
from pydantic import BaseModel, Field
class Foo(BaseModel):
bar: str = Field(aliases=[['baz', 2, 'qux']])
data = {
'baz': [
{'qux': 'a'},
{'qux': 'b'},
{'qux': 'c'},
{'qux': 'd'},
]
}
foo = Foo(**data)
assert foo.bar == 'c'
```
`aliases` is a list of lists because multiple paths can be provided, if so they're tried in turn until a value is found.
Tagged unions will use the same logic as `aliases` meaning nested attributes can be used to select a schema
to validate against.
### Improvements to Dumping/Serialization/Export :+1: :confused:
(I haven't worked on this yet, so these ideas are only provisional)
There has long been a debate about how to handle converting data when extracting it from a model.
One of the features people have long requested is the ability to convert data to JSON compliant types while
converting a model to a dict.
My plan is to move data export into pydantic-core, with that, one implementation can support all export modes without
compromising (and hopefully significantly improving) performance.
I see four different export/serialization scenarios:
1. Extracting the field values of a model with no conversion, effectively `model.__dict__` but with the current filtering
logic provided by `.dict()`
2. Extracting the field values of a model recursively (effectively what `.dict()` does now) - sub-models are converted to
dicts, but other fields remain unchanged.
3. Extracting data and converting at the same time (e.g. to JSON compliant types)
4. Serializing data straight to JSON
I think all 4 modes can be supported in a single implementation, with a kind of "3.5" mode where a python function
is used to convert the data as the user wishes.
The current `include` and `exclude` logic is extremely complicated, but hopefully it won't be too hard to
translate it to Rust.
We should also add support for `validate_alias` and `dump_alias` as well as the standard `alias`
to allow for customising field keys.
### Validation Context :+1:
Pydantic V2 will add a new optional `context` argument to `model_validate` and `model_validate_json`
which will allow you to pass information not available when creating a model to validators.
See [pydantic#1549](https://github.com/pydantic/pydantic/issues/1549) for motivation.
Here's an example of `context` might be used:
```py title="Context during Validation" test="skip" lint="skip" upgrade="skip"
from pydantic import BaseModel, EmailStr, validator
class User(BaseModel):
email: EmailStr
home_country: str
@validator('home_country')
def check_home_country(cls, v, context):
if v not in context['countries']:
raise ValueError('invalid country choice')
return v
async def add_user(post_data: bytes):
countries = set(await db_connection.fetch_all('select code from country'))
user = User.model_validate_json(post_data, context={'countries': countries})
...
```
:::note
We (actually mostly Sebastián :wink:) will have to make some changes to FastAPI to fully leverage `context`
as we'd need some kind of dependency injection to build context before validation so models can still be passed as arguments to views. I'm sure he'll be game.
:::
:::deter
Although this will make it slightly easier to run synchronous IO (HTTP requests, DB. queries, etc.)
from within validators, I strongly advise you keep IO separate from validation - do it before and use context,
do it afterwards, avoid where possible making queries inside validation.
:::
### Model Namespace Cleanup :+1:
For years I've wanted to clean up the model namespace,
see [pydantic#1001](https://github.com/pydantic/pydantic/issues/1001). This would avoid confusing gotchas when field
names clash with methods on a model, it would also make it safer to add more methods to a model without risking
new clashes.
After much deliberation (and even giving a lightning talk at the python language submit about alternatives, see
[this discussion](https://discuss.python.org/t/better-fields-access-and-allowing-a-new-character-at-the-start-of-identifiers/14529)).
I've decided to go with the simplest and clearest approach, at the expense of a bit more typing:
All methods on models will start with `model_`, fields' names will not be allowed to start with `"model"`
(aliases can be used if required).
This will mean `BaseModel` will have roughly the following signature.
```py title="Context during Validation" test="skip" lint="skip" upgrade="skip"
class BaseModel:
model_fields: List[FieldInfo]
"""previously `__fields__`, although the format will change a lot"""
@classmethod
def model_validate(cls, data: Any, *, context=None) -> Self: # (1)
"""
previously `parse_obj()`, validate data
"""
@classmethod
def model_validate_json(
cls,
data: str | bytes | bytearray,
*,
context=None
) -> Self:
"""
previously `parse_raw(..., content_type='application/json')`
validate data from JSON
"""
@classmethod
def model_is_instance(cls, data: Any, *, context=None) -> bool: # (2)
"""
new, check if data is value for the model
"""
@classmethod
def model_is_instance_json(
cls,
data: str | bytes | bytearray,
*,
context=None
) -> bool:
"""
Same as `model_is_instance`, but from JSON
"""
def model_dump(
self,
include: ... = None,
exclude: ... = None,
by_alias: bool = False,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
converter: Callable[[Any], Any] | None = None
) -> Any:
"""
previously `dict()`, as before
with new `mode` argument
"""
def model_dump_json(self, ...) -> str:
"""
previously `json()`, arguments as above
effectively equivalent to `json.dump(self.model_dump(..., mode='json'))`,
but more performant
"""
def model_json_schema(self, ...) -> dict[str, Any]:
"""
previously `schema()`, arguments roughly as before
JSON schema as a dict
"""
def model_update_forward_refs(self) -> None:
"""
previously `update_forward_refs()`, update forward references
"""
@classmethod
def model_construct(
self,
_fields_set: set[str] | None = None,
**values: Any
) -> Self:
"""
previously `construct()`, arguments roughly as before
construct a model with no validation
"""
@classmethod
def model_customize_schema(cls, schema: dict[str, Any]) -> dict[str, Any]:
"""
new, way to customize validation,
e.g. if you wanted to alter how the model validates certain types,
or add validation for a specific type without custom types or
decorated validators
"""
class ModelConfig:
"""
previously `Config`, configuration class for models
"""
```
1. see [Validation Context](#validation-context--section) for more information on `context`
2. see [`is_instance` checks](#is_instance-like-checks--section)
The following methods will be removed:
- `.parse_file()` - was a mistake, should never have been in pydantic
- `.parse_raw()` - partially replaced by `.model_validate_json()`, the other functionality was a mistake
- `.from_orm()` - the functionality has been moved to config, see [other improvements](#other-improvements--section) below
- `.schema_json()` - mostly since it causes confusion between pydantic validation schema and JSON schema,
and can be replaced with just `json.dumps(m.model_json_schema())`
- `.copy()` instead we'll implement `__copy__` and let people use the `copy` module
(this removes some functionality) from `copy()` but there are bugs and ambiguities with the functionality anyway
### Strict API & API documentation :+1:
When preparing for pydantic V2, we'll make a strict distinction between the public API and private functions & classes.
Private objects will be clearly identified as private via a `_internal` sub package to discourage use.
The public API will have API documentation. I've recently been working with the wonderful
[mkdocstrings](https://github.com/mkdocstrings/mkdocstrings) package for both
[dirty-equals](https://dirty-equals.helpmanual.io/) and
[watchfiles](https://watchfiles.helpmanual.io/) documentation. I intend to use `mkdocstrings` to generate complete
API documentation for V2.
This wouldn't replace the current example-based somewhat informal documentation style but instead will augment it.
### Error descriptions :+1:
The way line errors (the individual errors within a `ValidationError`) are built has become much more sophisticated
in pydantic-core.
There's a well-defined
[set of error codes and messages](https://github.com/pydantic/pydantic-core/blob/main/src/errors/kinds.rs).
More will be added when other types are validated via pure python validators in pydantic.
I would like to add a dedicated section to the documentation with extra information for each type of error.
This would be another key in a line error: `documentation`, which would link to the appropriate section in the
docs.
Thus, errors might look like:
```py title="Line Errors Example" test="skip" lint="skip" upgrade="skip"
[
{
'kind': 'greater_than_equal',
'loc': ['age'],
'message': 'Value must be greater than or equal to 18',
'input_value': 11,
'context': {'ge': 18},
'documentation': 'https://pydantic.dev/errors/#greater_than_equal',
},
{
'kind': 'bool_parsing',
'loc': ['is_developer'],
'message': 'Value must be a valid boolean, unable to interpret input',
'input_value': 'foobar',
'documentation': 'https://pydantic.dev/errors/#bool_parsing',
},
]
```
I own the `pydantic.dev` domain and will use it for at least these errors so that even if the docs URL
changes, the error will still link to the correct documentation. If developers don't want to show these errors to users,
they can always process the errors list and filter out items from each error they don't need or want.
### No pure python implementation :frowning:
Since pydantic-core is written in Rust, and I have absolutely no intention of rewriting it in python,
pydantic V2 will only work where a binary package can be installed.
pydantic-core will provide binaries in PyPI for (at least):
- **Linux**: `x86_64`, `aarch64`, `i686`, `armv7l`, `musl-x86_64` & `musl-aarch64`
- **MacOS**: `x86_64` & `arm64` (except python 3.7)
- **Windows**: `amd64` & `win32`
- **Web Assembly**: `wasm32`
(pydantic-core is [already](https://github.com/pydantic/pydantic-core/runs/7214195252?check_suite_focus=true)
compiled for wasm32 using emscripten and unit tests pass, except where cpython itself has
[problems](https://github.com/pyodide/pyodide/issues/2841))
Binaries for pypy are a work in progress and will be added if possible,
see [pydantic-core#154](https://github.com/pydantic/pydantic-core/issues/154).
Other binaries can be added provided they can be (cross-)compiled on github actions.
If no binary is available from PyPI, pydantic-core can be compiled from source if Rust stable is available.
The only place where I know this will cause problems is Raspberry Pi, which is a
[mess](https://github.com/piwheels/packages/issues/254) when it comes to packages written in Rust for Python.
Effectively, until that's fixed you'll likely have to install pydantic with
`pip install -i https://pypi.org/simple/ pydantic`.
### Pydantic becomes a pure python package :+1:
Pydantic V1.X is a pure python code base but is compiled with cython to provide some performance improvements.
Since the "hot" code is moved to pydantic-core, pydantic itself can go back to being a pure python package.
This should significantly reduce the size of the pydantic package and make unit tests of pydantic much faster.
In addition:
- some constraints on pydantic code can be removed once it no-longer has to be compilable with cython
- debugging will be easier as you'll be able to drop straight into the pydantic codebase as you can with other,
pure python packages
Some pieces of edge logic could get a little slower as they're no longer compiled.
### `is_instance` like checks :+1:
Strict mode also means it makes sense to provide an `is_instance` method on models which effectively run
validation then throws away the result while avoiding the (admittedly small) overhead of creating and raising
an error or returning the validation result.
To be clear, this isn't a real `isinstance` call, rather it is equivalent to
```py title="is_instance" test="skip" lint="skip" upgrade="skip"
class BaseModel:
...
@classmethod
def model_is_instance(cls, data: Any) -> bool:
try:
cls(**data)
except ValidationError:
return False
else:
return True
```
### I'm dropping the word "parse" and just using "validate" :neutral_face:
Partly due to the issues with the lack of strict mode,
I've gone back and forth between using the terms "parse" and "validate" for what pydantic does.
While pydantic is not simply a validation library (and I'm sure some would argue validation is not strictly what it does),
most people use the word **"validation"**.
It's time to stop fighting that, and use consistent names.
The word "parse" will no longer be used except when talking about JSON parsing, see
[model methods](#model-namespace-cleanup--section) above.
### Changes to custom field types :neutral_face:
Since the core structure of validators has changed from "a list of validators to call one after another" to
"a tree of validators which call each other", the
[`__get_validators__`](https://docs.pydantic.dev/usage/types/#classes-with-__get_validators__)
way of defining custom field types no longer makes sense.
Instead, we'll look for the attribute `__pydantic_validation_schema__` which must be a
pydantic-core compliant schema for validating data to this field type (the `function`
item can be a string, if so a function of that name will be taken from the class, see `'validate'` below).
Here's an example of how a custom field type could be defined:
```py title="New custom field types" test="skip" lint="skip" upgrade="skip"
from pydantic import ValidationSchema
class Foobar:
def __init__(self, value: str):
self.value = value
__pydantic_validation_schema__: ValidationSchema = {
'type': 'function',
'mode': 'after',
'function': 'validate',
'schema': {'type': 'str'},
}
@classmethod
def validate(cls, value):
if 'foobar' in value:
return Foobar(value)
else:
raise ValueError('expected foobar')
```
What's going on here: `__pydantic_validation_schema__` defines a schema which effectively says:
> Validate input data as a string, then call the `validate` function with that string, use the returned value
> as the final result of validation.
`ValidationSchema` is just an alias to
[`pydantic_core.Schema`](https://github.com/pydantic/pydantic-core/blob/main/pydantic_core/_types.py#L291)
which is a type defining the schema for validation schemas.
:::note
pydantic-core schema has full type definitions although since the type is recursive, mypy can't provide static type analysis, pyright however can.
We can probably provide one or more helper functions to make `__pydantic_validation_schema__` easier to generate.
:::
## Other Improvements :+1:
Some other things which will also change, IMHO for the better:
1. Recursive models with cyclic references - although recursive models were supported by pydantic V1,
data with cyclic references caused recursion errors, in pydantic-core cyclic references are correctly detected
and a validation error is raised
2. The reason I've been so keen to get pydantic-core to compile and run with wasm is that I want all examples
in the docs of pydantic V2 to be editable and runnable in the browser
3. Full support for `TypedDict`, including `total=False` - e.g. omitted keys,
providing validation schema to a `TypedDict` field/item will use `Annotated`, e.g. `Annotated[str, Field(strict=True)]`
4. `from_orm` has become `from_attributes` and is now defined at schema generation time
(either via model config or field config)
5. `input_value` has been added to each line error in a `ValidationError`, making errors easier to understand,
and more comprehensive details of errors to be provided to end users,
[pydantic#784](https://github.com/pydantic/pydantic/issues/784)
6. `on_error` logic in a schema which allows either a default value to be used in the event of an error,
or that value to be omitted (in the case of a `total=False` `TypedDict`),
[pydantic-core#151](https://github.com/pydantic/pydantic-core/issues/151)
7. `datetime`, `date`, `time` & `timedelta` validation is improved, see the
[speedate] Rust library I built specifically for this purpose for more details
8. Powerful "priority" system for optionally merging or overriding config in sub-models for nested schemas
9. Pydantic will support [annotated-types](https://github.com/annotated-types/annotated-types),
so you can do stuff like `Annotated[set[int], Len(0, 10)]` or `Name = Annotated[str, Len(1, 1024)]`
10. A single decorator for general usage - we should add a `validate` decorator which can be used:
- on functions (replacing `validate_arguments`)
- on dataclasses, `pydantic.dataclasses.dataclass` will become an alias of this
- on `TypedDict`s
- On any supported type, e.g. `Union[...]`, `Dict[str, Thing]`
- On Custom field types - e.g. anything with a `__pydantic_schema__` attribute
11. Easier validation error creation, I've often found myself wanting to raise `ValidationError`s outside
models, particularly in FastAPI
([here](https://github.com/samuelcolvin/foxglove/blob/a4aaacf372178f345e5ff1d569ee8fd9d10746a4/foxglove/exceptions.py#L137-L149)
is one method I've used), we should provide utilities to generate these errors
12. Improve the performance of `__eq__` on models
13. Computed fields, these having been an idea for a long time in pydantic - we should get them right
14. Model validation that avoids instances of subclasses leaking data (particularly important for FastAPI),
see [pydantic-core#155](https://github.com/pydantic/pydantic-core/issues/155)
15. We'll now follow [semvar](https://semver.org/) properly and avoid breaking changes between minor versions,
as a result, major versions will become more common
16. Improve generics to use `M(Basemodel, Generic[T])` instead of `M(GenericModel, Generic[T])` - e.g. `GenericModel`
can be removed; this results from no-longer needing to compile pydantic code with cython
## Removed Features & Limitations :frowning:
The emoji here is just for variation, I'm not frowning about any of this, these changes are either good IMHO
(will make pydantic cleaner, easier to learn and easier to maintain) or irrelevant to 99.9+% of users.
1. `__root__` custom root models are no longer necessary since validation on any supported data type is allowed
without a model
2. `.parse_file()` and `.parse_raw()`, partially replaced with `.model_validate_json()`,
see [model methods](#model-namespace-cleanup--section)
3. `.schema_json()` & `.copy()`, see [model methods](#model-namespace-cleanup--section)
4. `TypeError` are no longer considered as validation errors, but rather as internal errors, this is to better
catch errors in argument names in function validators.
5. Subclasses of builtin types like `str`, `bytes` and `int` are coerced to their parent builtin type,
this is a limitation of how pydantic-core converts these types to Rust types during validation, if you have a
specific need to keep the type, you can use wrap validators or custom type validation as described above
6. integers are represented in rust code as `i64`, meaning if you want to use ints where `abs(v) > 2^63 − 1`
(9,223,372,036,854,775,807), you'll need to use a [wrap validator](#validator-function-improvements----section) and your own logic
7. [Settings Management](https://docs.pydantic.dev/usage/settings/) ??? - I definitely don't want to
remove the functionality, but it's something of a historical curiosity that it lives within pydantic,
perhaps it should move to a separate package, perhaps installable alongside pydantic with
`pip install pydantic[settings]`?
8. The following `Config` properties will be removed or deprecated:
- `fields` - it's very old (it pre-dates `Field`), can be removed
- `allow_mutation` will be removed, instead `frozen` will be used
- `error_msg_templates`, it's not properly documented anyway, error messages can be customized with external logic if required
- `getter_dict` - pydantic-core has hardcoded `from_attributes` logic
- `json_loads` - again this is hard coded in pydantic-core
- `json_dumps` - possibly
- `json_encoders` - see the export "mode" discussion [above](#improvements-to-dumpingserializationexport---section)
- `underscore_attrs_are_private` we should just choose a sensible default
- `smart_union` - all unions are now "smart"
9. `dict(model)` functionality should be removed, there's a much clearer distinction now that in 2017 when I
implemented this between a model and a dict
## Features Remaining :neutral_face:
The following features will remain (mostly) unchanged:
- JSONSchema, internally this will need to change a lot, but hopefully the external interface will remain unchanged
- `dataclass` support, again internals might change, but not the external interface
- `validate_arguments`, might be renamed, but otherwise remain
- hypothesis plugin, might be able to improve this as part of the general cleanup
## Questions :question:
I hope the explanation above is useful. I'm sure people will have questions and feedback; I'm aware
I've skipped over some features with limited detail (this post is already fairly long :sleeping:).
To allow feedback without being overwhelmed, I've created a "Pydantic V2" category for
[discussions on github](https://github.com/pydantic/pydantic/discussions/categories/pydantic-v2) - please
feel free to create a discussion if you have any questions or suggestions.
We will endeavour to read and respond to everyone.
---
## Implementation Details :nerd:
(This is yet to be built, so these are nascent ideas which might change)
At the center of pydantic v2 will be a `PydanticValidator` class which looks roughly like this
(note: this is just pseudo-code, it's not even valid python and is only supposed to be used to demonstrate the idea):
```py title="PydanticValidator" test="skip" lint="skip" upgrade="skip"
# type identifying data which has been validated,
# as per pydantic-core, this can include "fields_set" data
ValidData = ...
# any type we can perform validation for
AnyOutputType = ...
class PydanticValidator:
def __init__(self, output_type: AnyOutputType, config: Config):
...
def validate(self, input_data: Any) -> ValidData:
...
def validate_json(self, input_data: str | bytes | bytearray) -> ValidData:
...
def is_instance(self, input_data: Any) -> bool:
...
def is_instance_json(self, input_data: str | bytes | bytearray) -> bool:
...
def json_schema(self) -> dict:
...
def dump(
self,
data: ValidData,
include: ... = None,
exclude: ... = None,
by_alias: bool = False,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
mode: Literal['unchanged', 'dicts', 'json-compliant'] = 'unchanged',
converter: Callable[[Any], Any] | None = None
) -> Any:
...
def dump_json(self, ...) -> str:
...
```
This could be used directly, but more commonly will be used by the following:
- `BaseModel`
- the `validate` decorator described above
- `pydantic.dataclasses.dataclass` (which might be an alias of `validate`)
- generics
The aim will be to get pydantic V2 to a place were the vast majority of tests continue to pass unchanged.
Thereby guaranteeing (as much as possible) that the external interface to pydantic and its behaviour are unchanged.
## Conversion Table :material-table:
The table below provisionally defines what input value types are allowed to which field types.
**An updated and complete version of this table is available in [V2 conversion table](https://docs.pydantic.dev/latest/concepts/conversion_table/)**.
:::note
Some type conversion shown here is a significant departure from existing behavior, we may have to provide a config flag for backwards compatibility for a few of them, however pydantic V2 cannot be entirely backward compatible, see [pydantic-core#152](https://github.com/pydantic/pydantic-core/issues/152).
:::
| Field Type | Input | Mode | Input Source | Conditions |
| ------------- | ----------- | ------ | ------------ | --------------------------------------------------------------------------- |
| `str` | `str` | both | python, JSON | - |
| `str` | `bytes` | lax | python | assumes UTF-8, error on unicode decoding error |
| `str` | `bytearray` | lax | python | assumes UTF-8, error on unicode decoding error |
| `bytes` | `bytes` | both | python | - |
| `bytes` | `str` | both | JSON | - |
| `bytes` | `str` | lax | python | - |
| `bytes` | `bytearray` | lax | python | - |
| `int` | `int` | strict | python, JSON | max abs value 2^64 - `i64` is used internally, `bool` explicitly forbidden |
| `int` | `int` | lax | python, JSON | `i64` |
| `int` | `float` | lax | python, JSON | `i64`, must be exact int, e.g. `f % 1 == 0`, `nan`, `inf` raise errors |
| `int` | `Decimal` | lax | python, JSON | `i64`, must be exact int, e.g. `f % 1 == 0` |
| `int` | `bool` | lax | python, JSON | - |
| `int` | `str` | lax | python, JSON | `i64`, must be numeric only, e.g. `[0-9]+` |
| `float` | `float` | strict | python, JSON | `bool` explicitly forbidden |
| `float` | `float` | lax | python, JSON | - |
| `float` | `int` | lax | python, JSON | - |
| `float` | `str` | lax | python, JSON | must match `[0-9]+(\.[0-9]+)?` |
| `float` | `Decimal` | lax | python | - |
| `float` | `bool` | lax | python, JSON | - |
| `bool` | `bool` | both | python, JSON | - |
| `bool` | `int` | lax | python, JSON | allowed: `0, 1` |
| `bool` | `float` | lax | python, JSON | allowed: `0, 1` |
| `bool` | `Decimal` | lax | python, JSON | allowed: `0, 1` |
| `bool` | `str` | lax | python, JSON | allowed: `'f', 'n', 'no', 'off', 'false', 't', 'y', 'on', 'yes', 'true'` |
| `None` | `None` | both | python, JSON | - |
| `date` | `date` | both | python | - |
| `date` | `datetime` | lax | python | must be exact date, eg. no H, M, S, f |
| `date` | `str` | both | JSON | format `YYYY-MM-DD` |
| `date` | `str` | lax | python | format `YYYY-MM-DD` |
| `date` | `bytes` | lax | python | format `YYYY-MM-DD` (UTF-8) |
| `date` | `int` | lax | python, JSON | interpreted as seconds or ms from epoch, see [speedate], must be exact date |
| `date` | `float` | lax | python, JSON | interpreted as seconds or ms from epoch, see [speedate], must be exact date |
| `datetime` | `datetime` | both | python | - |
| `datetime` | `date` | lax | python | - |
| `datetime` | `str` | both | JSON | format `YYYY-MM-DDTHH:MM:SS.f` etc. see [speedate] |
| `datetime` | `str` | lax | python | format `YYYY-MM-DDTHH:MM:SS.f` etc. see [speedate] |
| `datetime` | `bytes` | lax | python | format `YYYY-MM-DDTHH:MM:SS.f` etc. see [speedate], (UTF-8) |
| `datetime` | `int` | lax | python, JSON | interpreted as seconds or ms from epoch, see [speedate] |
| `datetime` | `float` | lax | python, JSON | interpreted as seconds or ms from epoch, see [speedate] |
| `time` | `time` | both | python | - |
| `time` | `str` | both | JSON | format `HH:MM:SS.FFFFFF` etc. see [speedate] |
| `time` | `str` | lax | python | format `HH:MM:SS.FFFFFF` etc. see [speedate] |
| `time` | `bytes` | lax | python | format `HH:MM:SS.FFFFFF` etc. see [speedate], (UTF-8) |
| `time` | `int` | lax | python, JSON | interpreted as seconds, range 0 - 86399 |
| `time` | `float` | lax | python, JSON | interpreted as seconds, range 0 - 86399.9\* |
| `time` | `Decimal` | lax | python, JSON | interpreted as seconds, range 0 - 86399.9\* |
| `timedelta` | `timedelta` | both | python | - |
| `timedelta` | `str` | both | JSON | format ISO8601 etc. see [speedate] |
| `timedelta` | `str` | lax | python | format ISO8601 etc. see [speedate] |
| `timedelta` | `bytes` | lax | python | format ISO8601 etc. see [speedate], (UTF-8) |
| `timedelta` | `int` | lax | python, JSON | interpreted as seconds |
| `timedelta` | `float` | lax | python, JSON | interpreted as seconds |
| `timedelta` | `Decimal` | lax | python, JSON | interpreted as seconds |
| `dict` | `dict` | both | python | - |
| `dict` | `Object` | both | JSON | - |
| `dict` | `mapping` | lax | python | must implement the mapping interface and have an `items()` method |
| `TypedDict` | `dict` | both | python | - |
| `TypedDict` | `Object` | both | JSON | - |
| `TypedDict` | `Any` | both | python | builtins not allowed, uses `getattr`, requires `from_attributes=True` |
| `TypedDict` | `mapping` | lax | python | must implement the mapping interface and have an `items()` method |
| `list` | `list` | both | python | - |
| `list` | `Array` | both | JSON | - |
| `list` | `tuple` | lax | python | - |
| `list` | `set` | lax | python | - |
| `list` | `frozenset` | lax | python | - |
| `list` | `dict_keys` | lax | python | - |
| `tuple` | `tuple` | both | python | - |
| `tuple` | `Array` | both | JSON | - |
| `tuple` | `list` | lax | python | - |
| `tuple` | `set` | lax | python | - |
| `tuple` | `frozenset` | lax | python | - |
| `tuple` | `dict_keys` | lax | python | - |
| `set` | `set` | both | python | - |
| `set` | `Array` | both | JSON | - |
| `set` | `list` | lax | python | - |
| `set` | `tuple` | lax | python | - |
| `set` | `frozenset` | lax | python | - |
| `set` | `dict_keys` | lax | python | - |
| `frozenset` | `frozenset` | both | python | - |
| `frozenset` | `Array` | both | JSON | - |
| `frozenset` | `list` | lax | python | - |
| `frozenset` | `tuple` | lax | python | - |
| `frozenset` | `set` | lax | python | - |
| `frozenset` | `dict_keys` | lax | python | - |
| `is_instance` | `Any` | both | python | `isinstance()` check returns `True` |
| `is_instance` | - | both | JSON | never valid |
| `callable` | `Any` | both | python | `callable()` check returns `True` |
| `callable` | - | both | JSON | never valid |
The `ModelClass` validator (use to create instances of a class) uses the `TypedDict` validator, then creates an instance
with `__dict__` and `__fields_set__` set, so same rules apply as `TypedDict`.
[speedate]: https://docs.rs/speedate/latest/speedate/
0:["k51-bk4GkAtOVPsc3sP-n",[[["",{"children":["articles",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["articles",{"children":["__PAGE__",{},[["$L1",[["$","section",null,{"className":"bg-purple-light","children":["$","div",null,{"className":"container relative","children":[["$","div",null,{"className":"grid-l"}],["$","div",null,{"className":"grid-r"}],["$","div",null,{"className":"w-full px-[10px] py-[40px] text-center md:py-[90px]","children":["$","h1",null,{"className":"text-58 text-petroleum","children":"Pydantic Blog"}]}]]}]}],["$","$L2",null,{"posts":[{"date":"2024-10-10T00:00:00.000Z","slug":"why-hyperlint-chose-logfire-for-observability","title":"Why Hyperlint Chose Pydantic Logfire as Our Observability Provider","description":"","ogImage":"","authors":[{"name":"Bill Chambers","picture":"https://avatars.githubusercontent.com/u/1642503"}],"categories":["Observability","Pydantic Logfire"],"content":"$3"},{"date":"2024-10-07T00:00:00.000Z","title":"The Pydantic Open Source Fund","description":"Pydantic is investing in the open source projects that power what we do.","ogImage":"","readtime":"4 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["Open Source","Company"],"slug":"pydantic-oss-fund-2024","content":"$4"},{"date":"2024-10-01T01:00:00.000Z","slug":"why-logfire","title":"Why is Pydantic building an Observability Platform?","description":"Why are the team behind Pydantic developing an observability platform?","ogImage":"","readtime":"8 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["Logfire","Company"],"content":"$5"},{"date":"2024-10-01T00:00:00.000Z","slug":"logfire-announcement","title":"Logfire launch and Series A Announcement","description":"Logfire is leaving beta and we've raised Series A funding","ogImage":"","readtime":"2 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["Logfire","Company"],"content":"$6"},{"date":"2024-09-05T00:00:00.000Z","slug":"pydantic-v2-9-release","title":"Pydantic v2.9","description":"","ogImage":"","readtime":"10 mins","categories":["Release","Performance Improvements","New Features"],"authors":[{"name":"Sydney Runkle","picture":"https://avatars.githubusercontent.com/u/54324534"}],"content":"$7"},{"date":"2024-07-01T00:00:00.000Z","slug":"pydantic-v2-8-release","title":"Pydantic v2.8","description":"","ogImage":"","readtime":"10 mins","categories":["Release","Performance Improvements","New Features"],"authors":[{"name":"Sydney Runkle","picture":"https://avatars.githubusercontent.com/u/54324534"}],"content":"$8"},{"date":"2024-04-11T00:00:00.000Z","slug":"pydantic-v2-7-release","title":"New Features and Performance Improvements in Pydantic v2.7","description":"","ogImage":"","readtime":"10 mins","categories":["Release","Performance Improvements","New Features"],"authors":[{"name":"Sydney Runkle","picture":"https://avatars.githubusercontent.com/u/54324534"}],"content":"$9"},{"date":"2024-04-04T00:00:00.000Z","slug":"lambda-intro","title":"AWS Lambda Data Validation with Pydantic","description":"","ogImage":"","authors":[{"name":"Sydney Runkle","picture":"https://avatars.githubusercontent.com/u/54324534"}],"categories":["AWS Lambda","Serverless"],"content":"$a"},{"date":"2024-02-29T00:00:00.000Z","title":"Building a product search API with GPT-4 Vision, Pydantic, and FastAPI","description":"","ogImage":"","authors":[{"name":"Jason Liu","picture":"https://avatars.githubusercontent.com/u/4852235"}],"categories":["LLMs"],"slug":"llm-vision","content":"$b"},{"date":"2024-01-18T00:00:00.000Z","slug":"llm-validation","title":"Minimize LLM Hallucinations with Pydantic Validators","description":"","ogImage":"","authors":[{"name":"Jason Liu","picture":"https://avatars.githubusercontent.com/u/4852235"}],"categories":["LLMs"],"content":"$c"},{"date":"2024-01-04T00:00:00.000Z","title":"Steering Large Language Models with Pydantic","description":"","ogImage":"","authors":[{"name":"Jason Liu","picture":"https://avatars.githubusercontent.com/u/4852235"}],"categories":["LLMs"],"slug":"llm-intro","content":"$d"},{"date":"2023-06-30T00:00:00.000Z","title":"Pydantic V2 Is Here!","description":"","ogImage":"","readtime":"2 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"},{"name":"Terrence Dorsey","picture":"https://avatars.githubusercontent.com/u/370316"}],"categories":["V2"],"slug":"pydantic-v2-final","content":"\nThe last few months have involved a whirlwind of work, and we're finally ready to announce to official release of\nPydantic V2!\n\n## Getting started with Pydantic V2\n\nTo get started with Pydantic V2, install it from PyPI:\n\n```bash\npip install -U pydantic\n```\n\nPydantic V2 is compatible with Python 3.8 and above.\n\nSee [the docs](https://docs.pydantic.dev/latest/) for examples of Pydantic at work.\n\n\n\n## Migration guide\n\nIf you are upgrading an existing project, you can use our extensive [migration guide](https://docs.pydantic.dev/latest/migration/) to understand\nwhat has changed.\n\nIf you do encounter any issues, please [create an issue in GitHub](https://github.com/pydantic/pydantic/issues/new?assignees=&labels=bug+V2%2Cunconfirmed&projects=&template=bug-v2.yml)\nusing the `bug V2` label.\nThis will help us to actively monitor and track errors, and to continue to improve the library’s performance.\n\nThank you for your support, and we look forward to your feedback.\n"},{"date":"2023-06-13T00:00:00.000Z","title":"Help Us Build Our Roadmap | Pydantic","description":"","ogImage":"","readtime":"15 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["Company","Logfire"],"slug":"roadmap","content":"$e"},{"date":"2023-04-03T00:00:00.000Z","title":"Pydantic V2 Pre Release","description":"","ogImage":"","readtime":"8 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"},{"name":"Terrence Dorsey","picture":"https://avatars.githubusercontent.com/u/370316"}],"categories":["V2"],"slug":"pydantic-v2-alpha","content":"$f"},{"date":"2023-02-16T00:00:00.000Z","title":"Company Announcement | Pydantic","description":"","ogImage":"","readtime":"8 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["Company"],"slug":"company-announcement","content":"$10"},{"date":"2022-07-10T00:00:00.000Z","title":"Pydantic V2 Plan","description":"","ogImage":"","readtime":"25 mins","authors":[{"name":"Samuel Colvin","picture":"https://avatars.githubusercontent.com/u/4039449"}],"categories":["V2"],"slug":"pydantic-v2","content":"$11"}],"categories":["Observability","Pydantic Logfire","Open Source","Company","Logfire","Release","Performance Improvements","New Features","AWS Lambda","Serverless","LLMs","V2"]}],["$","section",null,{"className":"bg-paraffin","children":["$","div",null,{"className":"container","children":["$","div",null,{"className":"border-x border-sugar/20 pt-10 md:py-[60px]","children":["$","div",null,{"className":"grid md:grid-cols-3","children":[["$","div",null,{"className":"relative flex aspect-[864/540] items-center justify-center text-center md:col-span-2","children":[["$","svg",null,{"xmlns":"http://www.w3.org/2000/svg","fill":"none","viewBox":"0 0 864 541","className":"absolute inset-0 z-0 h-full w-full","children":[["$","rect",null,{"width":"864","height":"540","transform":"translate(0 0.805664)","fill":"#300321"}],["$","path",null,{"d":"M1242.48 270.687L438.5 -533.293L-365.48 270.687L438.5 1074.67L1242.48 270.687Z","fill":"#C00C84","fillOpacity":"0.1"}],["$","path",null,{"d":"M1133.45 270.363L432 -431.087L-269.45 270.363L432 971.813L1133.45 270.363Z","fill":"#FCFFEC"}],["$","path",null,{"d":"M937.577 270.229L432 -235.349L-73.5772 270.229L432 775.806L937.577 270.229Z","fill":"#E520E9"}]]}],["$","div",null,{"className":"relative z-10 flex flex-col md:pt-2.5","children":[["$","h3",null,{"className":"text-70 max-w-[760px]","children":"Explore Logfire"}],["$","div",null,{"className":"mt-[34px] flex items-center justify-center gap-2","children":["$","a",null,{"href":"https://logfire.pydantic.dev/docs/","target":"_blank","className":"group relative inline-block overflow-hidden min-w-[9.75rem]","onMouseEnter":"$undefined","onMouseLeave":"$undefined","children":[["$","div",null,{"className":"btn h-full w-full duration-[400ms] group-hover:-translate-y-[101%] bg-paraffin text-white","children":["$","span",null,{"className":"block origin-bottom-left transition-transform duration-[400ms] group-hover:-rotate-3","children":"Get started"}]}],["$","div",null,{"aria-hidden":"true","className":"btn h-full w-full duration-[400ms] group-hover:-translate-y-[101%] bg-white text-paraffin absolute left-0 top-full","children":["$","span",null,{"className":"block origin-bottom-left transition-transform duration-[400ms] rotate-3 group-hover:rotate-0","children":"Get started"}]}]]}]}]]}]]}],["$","$L12",null,{"href":"https://github.com/pydantic","target":"_blank","className":"group relative flex flex-col justify-between gap-6 overflow-hidden bg-[#3E042B] px-2.5 pt-12 text-white max-md:max-h-[230px] max-md:text-center md:gap-2 md:pr-0 md:pt-12 lg:pl-[65px] lg:pt-[65px]","children":[["$","div",null,{"className":"max-md:hidden","children":["$","div",null,{"className":"absolute right-0 top-0 z-10 flex size-16 items-center justify-center bg-lithium/20 text-lithium transition-colors group-hover:bg-lithium group-hover:text-white","children":["$","svg",null,{"xmlns":"http://www.w3.org/2000/svg","fill":"none","viewBox":"0 0 29 28","className":"size-7","children":["$","path",null,{"stroke":"currentColor","strokeWidth":"1.925","d":"M8.308 6.514h13.611v13.612M21.92 6.514 6.265 22.168"}]}]}]}],["$","p",null,{"className":"text-18-mono max-w-[200px] !leading-normal !tracking-[0.09px] max-md:mx-auto md:max-w-[250px]","children":["Explore our"," ",["$","span",null,{"className":"text-lithium underline transition-colors group-hover:text-lithium/80","children":"open source"}]," ","packages"]}],["$","$L13",null,{"src":"/assets/footer/cta.svg","width":367,"height":335,"alt":"Product shot","className":"mx-auto transition-transform duration-500 group-hover:translate-y-2 md:mr-0 md:w-[80%] lg:w-full"}]]}]]}]}]}]}]]],null],null]},["$","$L14",null,{"parallelRouterKey":"children","segmentPath":["children","articles","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L15",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}],null]},[["$","html",null,{"lang":"en","className":"__variable_9a5b5d __variable_6acd15 __variable_2a2340 __variable_a148ac","children":[["$","head",null,{"children":[["$","link",null,{"rel":"apple-touch-icon","sizes":"180x180","href":"/favicon/apple-touch-icon.png"}],["$","link",null,{"rel":"icon","type":"image/png","sizes":"32x32","href":"/favicon/favicon-32x32.png"}],["$","link",null,{"rel":"icon","type":"image/png","sizes":"16x16","href":"/favicon/favicon-16x16.png"}],["$","link",null,{"rel":"manifest","href":"/favicon/site.webmanifest"}],["$","link",null,{"rel":"mask-icon","href":"/favicon/safari-pinned-tab.svg","color":"#000000"}],["$","link",null,{"rel":"shortcut icon","href":"/favicon/favicon.ico"}],["$","meta",null,{"name":"msapplication-TileColor","content":"#000000"}],["$","meta",null,{"name":"msapplication-config","content":"/favicon/browserconfig.xml"}],["$","meta",null,{"name":"theme-color","content":"#000"}],["$","link",null,{"rel":"alternate","type":"application/rss+xml","href":"/feed.xml"}]]}],["$","body",null,{"children":[["$","$L16",null,{}],["$","main",null,{"className":"pt-[60px] md:pt-[72px]","children":["$","$L14",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L15",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":["$","section",null,{"className":"flex min-h-[calc(100vh-72px)] items-center bg-paraffin py-12","children":["$","div",null,{"className":"container flex flex-col items-center space-y-8 sm:space-y-12","children":[["$","div",null,{"className":"flex w-full max-w-[42.5rem] items-center gap-x-6 sm:gap-x-9","children":[["$","svg",null,{"xmlns":"http://www.w3.org/2000/svg","width":"100%","height":"100%","viewBox":"0 0 156 205","fill":"none","className":"w-1/4","children":["$","path",null,{"d":"M99.9345 146.451H0V120.642L92.732 0H124.843V123.643H155.154V146.451H124.843V204.07H99.9345V146.451ZM99.9345 25.5088L25.2087 123.643H99.9345V25.5088Z","fill":"white"}]}],["$","div",null,{"className":"flex-1","children":["$","$17",null,{"fallback":null,"children":[["$","$L18",null,{"moduleIds":["app/not-found.tsx -> @/app/_components/globe/globe"]}],"$L19"]}]}],["$","svg",null,{"xmlns":"http://www.w3.org/2000/svg","width":"100%","height":"100%","viewBox":"0 0 156 205","fill":"none","className":"w-1/4","children":["$","path",null,{"d":"M99.9345 146.451H0V120.642L92.732 0H124.843V123.643H155.154V146.451H124.843V204.07H99.9345V146.451ZM99.9345 25.5088L25.2087 123.643H99.9345V25.5088Z","fill":"white"}]}]]}],["$","div",null,{"className":"text-20-mono text-center text-white","children":"Sorry, this page cannot be found."}],["$","$L12",null,{"href":"/","className":"btn btn-lithium sm:!mt-[3.75rem]","children":"Back to home"}]]}]}],"notFoundStyles":[],"styles":null}]}],["$","$L1a",null,{}],["$","script",null,{"async":true,"src":"/flarelytics/client.js"}]]}]]}],null],null],[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/fab6518cc982082e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/_next/static/css/041ce889ecf1af8b.css","precedence":"next","crossOrigin":"$undefined"}]],"$L1b"]]]]
1c:I[502,["7080","static/chunks/7080-108c8d457713fb20.js","9160","static/chunks/app/not-found-1269e4687839457d.js"],"default"]
19:["$","$L1c",null,{"className":"w-full"}]
1b:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Insights & product updates | Pydantic"}],["$","meta","3",{"name":"description","content":"Get the latest insights on the Pydantic blog, including product updates and our changelog."}],["$","meta","4",{"property":"og:title","content":"Insights & product updates | Pydantic"}],["$","meta","5",{"property":"og:description","content":"Get the latest insights on the Pydantic blog, including product updates and our changelog."}],["$","meta","6",{"property":"og:image","content":"https://pydantic.dev/articles/og.png"}],["$","meta","7",{"property":"og:image:alt","content":"Insights & product updates"}],["$","meta","8",{"name":"twitter:card","content":"summary_large_image"}],["$","meta","9",{"name":"twitter:title","content":"Insights & product updates | Pydantic"}],["$","meta","10",{"name":"twitter:description","content":"Get the latest insights on the Pydantic blog, including product updates and our changelog."}],["$","meta","11",{"name":"twitter:image","content":"https://pydantic.dev/articles/og.png"}],["$","meta","12",{"name":"twitter:image:alt","content":"Insights & product updates"}],["$","meta","13",{"name":"next-size-adjust"}]]
1:null