PyAI Conf
Register now
/Open Source

inline-snapshot: How We Write Tests at Pydantic

Alex Hall avatar
Alex Hall
5 mins

At Pydantic, we put a massive amount of effort writing and maintaining tests. We often build and open-source our own tooling to help, like pytest-examples that ensures our docs examples don't go stale.

But we also rely heavily on tools we didn't write. Today I want to highlight a library we didn't write, but that we strongly support and use extensively: inline-snapshot. It completely transformed how we write tests, especially for complex data structures.

A lot of tests look something like this:

def test_user_creation():
    user = create_user(id=123, name="test_user")
    assert user.id == 123
    assert user.name == "test_user"
    assert user.status == "active"

This is tedious to write and a pain to maintain. It's also not very thorough - you have no idea what other fields you're silently ignoring.

This is slightly better:

def test_user_creation():
    user = create_user(id=123, name="test_user")
    assert user.dict() == {
        "id": 123,
        "name": "test_user",
        "status": "active"
    }

Now you know that all fields are covered, and there's only one assertion, which will produce a nice diff in a pytest failure report. But if one day user.dict() starts including a new key, and you have dozens of tests like this, you have to go and manually update every single one.

There are other libraries that can help with this, like syrupy, which store the expected data in separate snapshot files. But this forces you to jump between files and makes the test code harder to read.

With inline-snapshot, you start by writing a test like this:

from inline_snapshot import snapshot

def test_user_creation():
    user = create_user(id=123, name="test_user")
    assert user.dict() == snapshot({})

Then run it with pytest --inline-snapshot=fix. The library automatically updates your source code:

def test_user_creation():
    user = create_user(id=123, name="test_user")
    assert user.dict() == snapshot({
        "id": 123,
        "name": "test_user",
        "status": "active"
    })

Now, if user.dict() changes in the future, it's easy to update the snapshots automatically, no matter how many there are.

I recently applied this pattern to the OpenAI Agents SDK in a series of pull requests (1, 2, 3) that were happily accepted without changes. This is a perfect example of how snapshots improve both test quality and developer velocity.

Before the change, many tests were incredibly basic, e.g:

traces = fetch_traces()
assert len(traces) == 1, f"Expected 1 trace, got {len(traces)}"

Developers write tests like this not because they don't care, but because verifying the full structure of traces is too much trouble.

By switching to inline-snapshot, we could upgrade this to a comprehensive assertion effectively for free:

assert fetch_normalized_spans() == snapshot(
    [
        {
            "workflow_name": "Agent workflow",
            "children": [
                {
                    "type": "agent",
                    "data": {
                        "name": "test_agent",
                        "handoffs": [],
                        "tools": [],
                        "output_type": "str",
                    },
                }
            ],
        }
    ]
)

Now the test asserts everything. If a single key changes, the test fails, but it's easy to fix by rerunning pytest --inline-snapshot=fix.

I migrated the tests to use about 40 snapshots. Doing this without a snapshot library would have been a huge amount of manual work to insert all the expected data. But it was easy to quickly insert assert fetch_normalized_spans() == snapshot() in lots of places (often via find/replace) and then run the test suite once to auto-generate all the expected data.

Expected data often contains dynamic fields like timestamps or random IDs that change on every run.

For example, the test below won't work if "id": 123 changes every time:

def test_user_creation():
    user = create_user(name="test_user")
    assert user.dict() == snapshot({
        "id": 123,
        "name": "test_user",
        "status": "active"
    })

A good way to handle this is to normalize the data before snapshotting it, by stripping out or fixing dynamic fields. If the same dynamic fields appear in many places, you can write a helper function to do this consistently. But writing such helpers can be tedious and isn't really worth the effort for dynamic fields that only appear in a few places. Here's an easier ad-hoc approach:

def test_user_creation():
    user = create_user(name="test_user")
    assert user.dict() == snapshot({
        "id": user.id,
        "name": "test_user",
        "status": "active"
    })

inline-snapshot is able to magically preserve dynamic values like user.id in the snapshot. When a new key is added, it will update the snapshot to:

def test_user_creation():
    user = create_user(name="test_user")
    assert user.dict() == snapshot({
        "id": user.id,  # Preserved dynamic value
        "name": "test_user",
        "status": "active",
        "new_key": "new_value"  # New key added
    })

But "id": user.id isn't a very useful assertion, it just checks that a values equals itself. And extracting dynamic values like this is often more painful.

Enter dirty-equals

To handle this better, we combine inline-snapshot with dirty-equals, a library written by Pydantic founder Samuel Colvin. It lets you assert that a value satisfies a condition rather than equaling a specific literal.

This is very useful on its own, but it really shines when combined with inline snapshots:

from dirty_equals import IsInt, IsNow
from inline_snapshot import snapshot

assert user.dict() == snapshot({
    "id": IsInt(),          # Matches any integer
    "created_at": IsNow(),  # Matches any datetime close to the current time
    "name": "test_user",
    "status": "active"
})

Tip: convert data to builtins

Suppose User is a dataclass or pydantic.BaseModel. You could probably write your test like this:

assert user == snapshot(
	User(
		id=IsInt(),
		created_at=IsNow(),
		name="test_user",
		status="active"
	)
)

inline-snapshot knows how to update these snapshots while preserving the dynamic parts, if for example User gets an additional optional parameter in the future. But if User gets an additional required parameter, or if the signature changes in some other way that the User(...) constructor call above throws an error, then the test will fail before it even gets to the snapshot assertion, and you'll have to fix the test manually.

To avoid this, convert your data to built-in types (lists, dicts, etc.) before snapshotting it. This way, the snapshot assertion is the only thing that can fail, and it will always be able to update the snapshot as needed.

There's an easy way to do this recursively that works with all dataclasses and Pydantic models:

from pydantic import TypeAdapter

_adapter = TypeAdapter(object)

def as_dicts(value: object):
    return _adapter.dump_python(value)

Then you can just use assert as_dicts(...) == snapshot(...) everywhere. This is also a good place to add other normalization logic of dynamic values.

Accidental Synergy: Nesting Snapshots

Imagine you need to check this value:

api_response = {
    "status": 200,
    "headers": {"content-type": "application/json"},
    "body": '{"data": {"user_id": 123}, "timestamp": 923847329401}'
}

Where body is a string, not a parsed object, and timestamp is dynamic. dirty_equals handles this nicely:

from dirty_equals import IsJson, IsInt

assert api_response == {
    "status": 200,
    "headers": {"content-type": "application/json"},
    "body": IsJson({
        "data": {"user_id": 123},
        "timestamp": IsInt()
    })
}

Now if you wrap the whole expected value in snapshot(), it can auto-update the outer keys, but not the contents of IsJson. But that's OK, because you can also nest another snapshot() inside IsJson:

assert api_response == snapshot(
    {
        'status': 200,
        'headers': {'content-type': 'application/json'},
        'body': IsJson(
            snapshot(
                {
                    'data': {'user_id': 123},
                    'timestamp': IsInt(),
                }
            )
        ),
    }
)

Now adding a new key to either api_response or body will just work when updating snapshots. Even the author of inline-snapshot was surprised by this! It just arises naturally from how both libraries hook into __eq__ (the == operator).

This isn't just a crazy hypothetical: we use this pattern in the Pydantic Logfire SDK tests. But it still requires manual effort to insert IsJson(snapshot(...)) in the first place, so I suggest instead parsing JSON in a normalization helper, e.g:

from pydantic_core import from_json  # faster than json.loads
from collections.abc import Sequence, Mapping


def parse_inner_json(value: object):
    if isinstance(value, str):
        if value.startswith(("{", "[")):
            try:
                return from_json(value)
            except ValueError:
                return value
        else:
            return value
    elif isinstance(value, Sequence):
        return type(value)([parse_inner_json(v) for v in value])
    elif isinstance(value, Mapping):
        return type(value)({k: parse_inner_json(v) for k, v in value.items()})
    else:
        return value


assert parse_inner_json(
    {
        "status": 200,
        "headers": {"content-type": "application/json"},
        "body": '{"data": {"user_id": 123}, "timestamp": 923847329401}',
    }
) == {
    "status": 200,
    "headers": {"content-type": "application/json"},
    "body": {
        "data": {"user_id": 123},
        "timestamp": 923847329401,
    },
}

You could combine this with as_dicts() to have a single normalization function for all your test data.

I've used pydantic_core.from_json here because it's faster than json.loads, but you could just use json.loads or another library like orjson.

But how does all of this actually work? Here's what's going on under the hood.

To make inline-snapshot work, the library needs to know exactly where in your source code the snapshot() function was called so it can overwrite it. It does this using a library I wrote called executing, which inspects the Python AST (Abstract Syntax Tree) to locate the precise call site.

This is a tricky problem to solve robustly, but executing manages it well across different Python versions and edge cases. I strongly recommend it for anyone writing libraries that need to magically know where functions were called from.

executing is also used by the Logfire SDK to inspect f-strings, so that this:

logfire.info(f'Hello {name}')

can actually capture both the template string 'Hello {name}' and the variable name separately, making your logs more structured and queryable without any extra effort from the developer. Without this magic, you'd have to write:

logfire.info('Hello {name}', name=name)

We rely heavily on inline-snapshot at Pydantic, and we want to ensure it stays maintained.

I want to thank Frank (15r10nk), the author of inline-snapshot. He has been incredibly helpful not just with his own library, but taking over maintenance of executing (the library I wrote) which frees me up to work on Pydantic products.

That's why Pydantic sponsors Frank and his work. We believe the companies that benefit from open source should fund the people who build it. It's core to how we operate, and it's why we're members of the Open Source Pledge and the Agentic AI Foundation.