/AWS Lambda

AWS Lambda Data Validation with Pydantic

Sydney Runkle avatar
Sydney Runkle
2024/04/04

AWS Lambda is a popular serverless computing service that allows developers to run code without provisioning or managing servers. This service is so widely used because it supports automatic scaling and offers a cost-effective pay-per-call pricing model.

AWS Lambda functions can be triggered by various AWS services and other event sources, which pass event and context data to said function. Like any other application, it's critical to structure and validate this incoming data to ensure proper execution of the function and reliability of the results.

In this article, we'll explore how Pydantic, the leading data validation library for Python, can be leveraged to structure and validate event and context data in AWS Lambda functions. We'll discuss the importance of understanding the structure of event and context data, and how Pydantic can help enhance developer experience by improving readability and maintainability of Lambda functions.

For comprehensive instructions on setting up an AWS Lambda function, refer to the official guide. This resource provides a step-by-step tutorial on how to creating and testing a function via the AWS Management Console. The guide also provides links to more advanced topics such as trigger configuration and monitoring / logging.

By using Pydantic to structure and validate the event and context data, one can enhance the developer experience by improving type-hinting and autocompletion, generating automatic documentation, and enhancing debuggability with straightforward and comprehensive error messages.

Early validation with Pydantic also facilitates runtime improvements, such as faster failure for invalid inputs, reduced load and execution costs, and improved security against malicious incoming data.

First, let's take a closer look at AWS Lambda and the data that is passed into a Lambda function when it is invoked.

When a Lambda function is invoked, it receives two parameters: event and context. The event parameter contains the data that is passed into the function, while the context parameter provides information about the invocation, function, and execution environment. The event and context parameters are both dictionaries. We will soon see that we can validate the contents of these dictionaries with Pydantic.

Let's consider a simple example of a Lambda function that receives a user sign-up event. The event data contains:

  • name (str): The first and last name of the user.
  • birthday (date): The user's date of birth.
  • email (str): The user's email address.

We'll work with a basic Lambda function that processes this event, calculates the user's age, and returns a success response.

Here's the Lambda function without Pydantic validation:

from datetime import date, datetime


def lambda_handler(event: dict, context: dict) -> dict:
    name = event["name"]
    birthday = datetime.strptime(event["birthday"], "%Y-%m-%d").date()
    email = event["email"]

    age = (date.today() - birthday).days // 365

    # Send a welcome email, store user data in a database, etc.

    return {
        "result": "success",
        "user": {
            "name": name,
            "birthday": birthday.strftime("%Y-%m-%d"),
            "email": email,
            "age": age
        },
        "request_id": context.aws_request_id,
    }

Lambda functions are typically invoked by sending a web request to a configured endpoint. The service calling the Lambda function passes the event and context data to the function. This is effectively equivalent to invoking the function directly with the event and context data as arguments, which, for simplicity, is what we'll do in the following examples. Later in the article, we show how to invoke a Lambda function using the AWS CLI.

More concretely, the following script is representative of what happens when the Lambda service invokes the function:

import json # (1)!


event = {
    "name": "Alice",
    "birthday": "1990-01-01",
    "email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "success",
  "user": {
    "name": "Alice",
    "birthday": "1990-01-01",
    "email": "[email protected]",
    "age": 34
  },
  "request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
  1. For all future invocation examples, we will use the json module to pretty-print the output of the Lambda function for better readability. You can assume that this import is present in all future examples.

What could go wrong here? Lots of things. To name a few:

  1. The event data might be missing required fields.
  2. The event data might contain fields with incorrect types or formats (e.g., what happens if birthday is not a date?).
  3. The event data might contain fields with invalid values (e.g., what happens if birthday is in the future?).

To address these issues, we can use Pydantic to define models that represent the structure of the event and context data, and validate the incoming data before processing it in the Lambda function.

from datetime import date

from pydantic import BaseModel, ValidationError, computed_field


class UserSignUpEvent(BaseModel):
    name: str
    birthday: date
    email: str

    @computed_field
    @property
    def age(self) -> int: # (1)!
        return (date.today() - self.birthday).days // 365


class Context(BaseModel):
    aws_request_id: str # (2)!


def lambda_handler(event: dict, context: dict) -> dict:
    try:
        user = UserSignUpEvent.model_validate(event)
        context_data = Context.model_validate(context)
    except ValidationError as e:
        return {"result": "error", "message": e.errors(include_url=False)} # (3)!

    # Send a welcome email, store user data in a database, etc.

    return {
        "result": "success",
        "user": user.model_dump(mode="json"), # (4)!
        "request_id": context_data.aws_request_id,
    }
  1. Pydantic offers a @computed_field decorator that allows us to define a property that is computed based on other fields in the model. In this case, we use it to calculate the user's age based on their birthday.
  2. Pydantic models have the extra setting set to ignore by default, which is why we can selectively define only the attributes we care about in the Context model.
  3. We exclude the URL from the error messages to keep them concise and readable.
  4. We use the model_validate method to validate the event and context data against their corresponding Pydantic models. If the event data is invalid, a ValidationError will be raised, and the function will fail early with a descriptive error response.

Let's look at a sample invocation of the Lambda function with Pydantic validation:

event = {
    "name": "Alice",
    "birthday": "1990-01-01", # (1)!
    "email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "success",
  "user": {
    "name": "Alice",
    "birthday": "1990-01-01",
    "email": "[email protected]",
    "age": 34
  },
  "request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
  1. In this invocation, we pass the birthday as a string. Pydantic will automatically parse the string into a date object, so the function will process the data successfully.

As we'd expect, the function processes the data successfully and returns a success response (these results are identical to that of the original function, without Pydantic validation).

However, where Pydantic shines is when the incoming data is invalid.

Consider the following invocation, with incomplete event data:

event = {
    "name": "Alice",
    "birthday": "1990-01-01",
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "error",
  "message": [
    {
      "type": "missing",
      "loc": [
        "email"
      ],
      "msg": "Field required",
      "input": {
        "name": "Alice",
        "birthday": "1990-01-01"
      }
    }
  ]
}
"""

As you can see, Pydantic provides the caller with detailed information about the missing email field in the event data. This is a significant improvement over the original function, which would have raised an error, only accessible from deep within the Lambda's logs. No easy-to-understand error message would have been returned to the caller in the case of the original function. You can see what I mean here.

Alternatively, consider the following invocation, where birthday is not a valid date (there's no February 31st):

event = {
    "name": "Alice",
    "birthday": "1990-02-31",
    "email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "error",
  "message": [
    {
      "type": "date_from_datetime_parsing",
      "loc": [
        "birthday"
      ],
      "msg": "Input should be a valid date or datetime, day value is outside expected range",
      "input": "1990-02-31",
      "ctx": {
        "error": "day value is outside expected range"
      }
    }
  ]
}
"""

This is just the beginning of what Pydantic can do for your Lambda functions.

Upgrade 1: Using the validate_call decorator

In the previous example, we used the model_validate method to validate the event and context data. Pydantic also provides a validate_call decorator that can be used to validate the arguments of a function. This decorator can be used to validate the event and context data directly in the function signature, like this:

from pydantic import validate_call


@validate_call
def lambda_handler_inner(event: UserSignUpEvent, context: Context) -> dict:
    # Send a welcome email, store user data in a database, etc.

    return {
        "result": "success",
        "user": event.model_dump(mode="json"),
        "request_id": context.aws_request_id,
    }


def lambda_handler(event: dict, context: dict) -> dict:
    try:
        response = lambda_handler_inner(event, context)
        return response
    except ValidationError as e:
        return {"result": "error", "message": e.errors(include_url=False)}

This approach allows us to catch any validation errors associated with the event and context data together, and removes the need to explicitly validate the data in the function body.

Here's an example of what an error response might look like when using the validate_call decorator:

event = { # (1)!
    "name": "Alice",
    "birthday": "1990-01-01",
}
context = {} # (2)!

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "error",
  "message": [
    {
      "type": "missing",
      "loc": [
        0,
        "email"
      ],
      "msg": "Field required",
      "input": {
        "name": "Alice",
        "birthday": "1990-01-01"
      }
    },
    {
      "type": "missing",
      "loc": [
        1,
        "aws_request_id"
      ],
      "msg": "Field required",
      "input": {}
    }
  ]
}
"""
  1. In this invocation, the email field is missing from the event data.
  2. The aws_request_id field is missing from the context data.

This result showcases the implicit validation of the event and context data in the function signature, and the detailed error messages that are returned when the data (for both) is invalid.

Upgrade 2: Enhancing birthday validation

In the previous examples, we used a date field to represent the birthday data in the event model. Pydantic provides specialized field types that can be used to enhance the validation of the data. For example, we can use the PastDate field type to represent the birthday data, and provide additional validation logic to ensure that the date is in the past (we can't have users signing up with future birthdays).

If we define the UserSignUpEvent model like this:

from datetime import date

from pydantic import BaseModel, PastDate


class UserSignUpEvent(BaseModel):
    name: str
    birthday: PastDate
    email: str

    @computed_field
    @property
    def age(self) -> int:
        return (date.today() - self.birthday).days // 365

We can now validate the birthday data to ensure that it is a valid date and that it is in the past. Here's an example of what an error response might look like when the birthday data is in the future:

event = {
    "name": "Alice",
    "birthday": "2090-01-01",
    "email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))

"""
{
  "result": "error",
  "message": [
    {
      "type": "date_past",
      "loc": [
        "birthday"
      ],
      "msg": "Date should be in the past",
      "input": "2090-01-01"
    }
  ]
}
"""

Upgrade 3: Customizing name validation

You can also customize the validation logic for a field by defining a custom validator function. For example, we can define a custom validator function to ensure that the name field contains both a first and last name, and then title case the result.

For example:

from pydantic import BaseModel, field_validator

class UserSignUpEvent(BaseModel):
    name: str
    birthday: date
    email: str

    @computed_field
    @property
    def age(self) -> int:
        return (date.today() - self.birthday).days // 365

    @field_validator('name')
    @classmethod
    def name_has_first_and_last(cls, v: str) -> str:
        stripped_name = v.strip()
        if ' ' not in stripped_name:
            raise ValueError('`name` must contain first and last name, got {v}')
        return stripped_name.title()

For a valid name field, we can see that the name is title-cased:

event = {
    "name": "alice smith",
    "birthday": "1990-01-01",
    "email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}

print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
  "result": "success",
  "user": {
    "name": "Alice Smith",
    "birthday": "1990-01-01",
    "email": "[email protected]",
    "age": 34
  },
  "request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""

As you can imagine, if the name field is missing a last name, the function will raise a descriptive error.

Thus far, we've been invoking the Lambda function directly in Python. In practice, Lambda functions are typically invoked by other services, such as API Gateway, S3, or SNS. The method of invocation will depend on your specific use case and requirements. We'll demonstrate how to invoke the Lambda function using the AWS CLI, which is a common way to test Lambda functions locally.

To invoke this Lambda function with the AWS CLI, you can use the aws lambda invoke command:

aws lambda invoke \
--function-name my-function \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01", "email": "[email protected]"}' \
output.json

This command assumes that you have the AWS CLI installed and configured with the appropriate credentials. It also assumes that you've configured your Lambda function with the name my-function. The --payload option is used to pass the event data to the Lambda function, and the output of the function will be written to the output.json file.

If we pass in the valid event data used above, we see following in the output.json file:

{
	"result": "success",
	"user": {
		"name": "Alice",
		"birthday": "1990-01-01",
		"email": "[email protected]",
		"age": 34
	},
	"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}

Similarly, if we invoke our lambda with an invalid payload, we can expect the output.json file to be populated with a detailed error response.

Here, we can see the concrete benefits of invoking a Lambda function with Pydantic compared to invoking a Lambda function without Pydantic, using the AWS CLI. Consider this invocation:

aws lambda invoke \
--function-name my-lambda \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01"}'
output.json && cat output.json

Console output:

{
    "StatusCode": 200, # (1)!
    "FunctionError": "Unhandled",
    "ExecutedVersion": "$LATEST"
}
  1. This 200 status code indicates that the function was invoked successfully. That said, the FunctionError field indicates that an unhandled error occurred during the function execution.

Console output:

{
    "StatusCode": 200, # (1)!
    "ExecutedVersion": "$LATEST"
}
{
    "result": "error",
    "message": [
        {
            "type": "missing",
            "loc": [
                "email"
            ],
            "msg": "Field required",
            "input": {
                "name": "Alice",
                "birthday": "1990-01-01"
            }
        }
    ]
}
  1. This 200 status code indicates that the function was invoked successfully. The response payload contains a detailed error message that explains what went wrong with the input data.

The response from the original Lambda function is unhelpful and doesn't provide any information about what went wrong. In order to debug the issue, you would need to dig into the logs in the AWS management console.

On the other hand, the response from the Lambda function with Pydantic validation is clear and concise. It provides detailed information about the missing email field in the event data, making it easy to identify and fix the issue.

In this article we demonstrated that Pydantic is a powerful tool for structuring and validating event and context data in AWS Lambda functions. By utilizing Pydantic, developers can improve the developer experience and runtime performance of their Lambda functions.

We encourage developers to adopt Pydantic as a best practice when developing AWS Lambda functions. Integrating Pydantic into your Lambda functions can be a game-changer, enhancing your code's readability, maintainability, and efficiency.

If you're interested in further exploring the integration capabilities between Pydantic and AWS Lambda, consider the following next steps:

  1. Use pydantic-settings to manage environment variables in your Lambda functions.
  2. Take a deep dive into Pydantic's more advanced features, like custom validation and serialization to transform your Lambda's data.
  3. Explore creating a Pydantic Lambda Layer to share the Pydantic library across multiple Lambda functions.
  4. Take a look at more Pydantic custom types, like NameEmail, SecretStr, and many others.