AWS Lambda is a popular serverless computing service that allows developers to run code without provisioning or managing servers. This service is so widely used because it supports automatic scaling and offers a cost-effective pay-per-call pricing model.
AWS Lambda functions can be triggered by various AWS services and other event sources, which pass event
and context
data to said function. Like any other application, it's critical to structure and validate this incoming data to ensure proper execution of the function and reliability of the results.
In this article, we'll explore how Pydantic, the leading data validation library for Python, can be leveraged to structure and validate event
and context
data in AWS Lambda functions. We'll discuss the importance of understanding the structure of event
and context
data, and how Pydantic can help enhance developer experience by improving readability and maintainability of Lambda functions.
For comprehensive instructions on setting up an AWS Lambda function, refer to the official guide. This resource provides a step-by-step tutorial on how to creating and testing a function via the AWS Management Console. The guide also provides links to more advanced topics such as trigger configuration and monitoring / logging.
By using Pydantic to structure and validate the event
and context
data, one can enhance the developer experience by improving type-hinting and autocompletion, generating automatic documentation, and enhancing debuggability with straightforward and comprehensive error messages.
Early validation with Pydantic also facilitates runtime improvements, such as faster failure for invalid inputs, reduced load and execution costs, and improved security against malicious incoming data.
A Simple Example
First, let's take a closer look at AWS Lambda and the data that is passed into a Lambda function when it is invoked.
When a Lambda function is invoked, it receives two parameters: event
and context
.
The event
parameter contains the data that is passed into the function, while the context
parameter provides information about the invocation, function, and execution environment. The event
and context
parameters are both dictionaries. We will soon see that we can validate the contents of these dictionaries with Pydantic.
Without Pydantic
Let's consider a simple example of a Lambda function that receives a user sign-up event. The event
data contains:
name
(str): The first and last name of the user.birthday
(date): The user's date of birth.email
(str): The user's email address.
We'll work with a basic Lambda function that processes this event, calculates the user's age, and returns a success response.
Here's the Lambda function without Pydantic validation:
from datetime import date, datetime
def lambda_handler(event: dict, context: dict) -> dict:
name = event["name"]
birthday = datetime.strptime(event["birthday"], "%Y-%m-%d").date()
email = event["email"]
age = (date.today() - birthday).days // 365
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": {
"name": name,
"birthday": birthday.strftime("%Y-%m-%d"),
"email": email,
"age": age
},
"request_id": context.aws_request_id,
}
Lambda functions are typically invoked by sending a web request to a configured endpoint. The service calling the Lambda function passes the event
and context
data to the function. This is effectively equivalent to invoking the function directly with the event
and context
data as arguments, which, for simplicity, is what we'll do in the following examples. Later in the article, we show how to invoke a Lambda function using the AWS CLI.
More concretely, the following script is representative of what happens when the Lambda service invokes the function:
import json # (1)!
event = {
"name": "Alice",
"birthday": "1990-01-01",
"email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "[email protected]",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
- For all future invocation examples, we will use the
json
module to pretty-print the output of the Lambda function for better readability. You can assume that this import is present in all future examples.
What could go wrong here? Lots of things. To name a few:
- The
event
data might be missing required fields. - The
event
data might contain fields with incorrect types or formats (e.g., what happens ifbirthday
is not a date?). - The
event
data might contain fields with invalid values (e.g., what happens ifbirthday
is in the future?).
To address these issues, we can use Pydantic to define models that represent the structure of the event
and context
data,
and validate the incoming data before processing it in the Lambda function.
With Pydantic
from datetime import date
from pydantic import BaseModel, ValidationError, computed_field
class UserSignUpEvent(BaseModel):
name: str
birthday: date
email: str
@computed_field
@property
def age(self) -> int: # (1)!
return (date.today() - self.birthday).days // 365
class Context(BaseModel):
aws_request_id: str # (2)!
def lambda_handler(event: dict, context: dict) -> dict:
try:
user = UserSignUpEvent.model_validate(event)
context_data = Context.model_validate(context)
except ValidationError as e:
return {"result": "error", "message": e.errors(include_url=False)} # (3)!
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": user.model_dump(mode="json"), # (4)!
"request_id": context_data.aws_request_id,
}
- Pydantic offers a
@computed_field
decorator that allows us to define a property that is computed based on other fields in the model. In this case, we use it to calculate the user's age based on their birthday. - Pydantic models have the
extra
setting set toignore
by default, which is why we can selectively define only the attributes we care about in theContext
model. - We exclude the URL from the error messages to keep them concise and readable.
- We use the
model_validate
method to validate theevent
andcontext
data against their corresponding Pydantic models. If theevent
data is invalid, aValidationError
will be raised, and the function will fail early with a descriptive error response.
Let's look at a sample invocation of the Lambda function with Pydantic validation:
event = {
"name": "Alice",
"birthday": "1990-01-01", # (1)!
"email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "[email protected]",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
- In this invocation, we pass the
birthday
as a string. Pydantic will automatically parse the string into adate
object, so the function will process the data successfully.
As we'd expect, the function processes the data successfully and returns a success response (these results are identical to that of the original function, without Pydantic validation).
However, where Pydantic shines is when the incoming data is invalid.
Consider the following invocation, with incomplete event
data:
event = {
"name": "Alice",
"birthday": "1990-01-01",
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
}
]
}
"""
As you can see, Pydantic provides the caller with detailed information about the missing email
field in the event
data. This is a significant improvement over the original function, which would have raised an error, only accessible from deep within the Lambda's logs. No easy-to-understand error message would have been returned to the caller in the case of the original function. You can see what I mean here.
Alternatively, consider the following invocation, where birthday
is not a valid date (there's no February 31st):
event = {
"name": "Alice",
"birthday": "1990-02-31",
"email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "date_from_datetime_parsing",
"loc": [
"birthday"
],
"msg": "Input should be a valid date or datetime, day value is outside expected range",
"input": "1990-02-31",
"ctx": {
"error": "day value is outside expected range"
}
}
]
}
"""
This is just the beginning of what Pydantic can do for your Lambda functions.
Upgrade 1: Using the validate_call
decorator
In the previous example, we used the model_validate
method to validate the event
and context
data. Pydantic also provides a validate_call
decorator that can be used to validate the arguments of a function. This decorator can be used to validate the event
and context
data directly in the function signature, like this:
from pydantic import validate_call
@validate_call
def lambda_handler_inner(event: UserSignUpEvent, context: Context) -> dict:
# Send a welcome email, store user data in a database, etc.
return {
"result": "success",
"user": event.model_dump(mode="json"),
"request_id": context.aws_request_id,
}
def lambda_handler(event: dict, context: dict) -> dict:
try:
response = lambda_handler_inner(event, context)
return response
except ValidationError as e:
return {"result": "error", "message": e.errors(include_url=False)}
This approach allows us to catch any validation errors associated with the event
and context
data together, and removes the need to explicitly validate the data in the function body.
Here's an example of what an error response might look like when using the validate_call
decorator:
event = { # (1)!
"name": "Alice",
"birthday": "1990-01-01",
}
context = {} # (2)!
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
0,
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
},
{
"type": "missing",
"loc": [
1,
"aws_request_id"
],
"msg": "Field required",
"input": {}
}
]
}
"""
- In this invocation, the
email
field is missing from theevent
data. - The
aws_request_id
field is missing from thecontext
data.
This result showcases the implicit validation of the event
and context
data in the function signature, and the detailed error messages that are returned when the data (for both) is invalid.
Upgrade 2: Enhancing birthday
validation
In the previous examples, we used a date
field to represent the birthday
data in the event
model. Pydantic provides specialized field types that can be used to enhance the validation of the data. For example, we can use the PastDate
field type to represent the birthday
data, and provide additional validation logic to ensure that the date is in the past (we can't have users signing up with future birthdays).
If we define the UserSignUpEvent
model like this:
from datetime import date
from pydantic import BaseModel, PastDate
class UserSignUpEvent(BaseModel):
name: str
birthday: PastDate
email: str
@computed_field
@property
def age(self) -> int:
return (date.today() - self.birthday).days // 365
We can now validate the birthday
data to ensure that it is a valid date and that it is in the past. Here's an example of what an error response might look like when the birthday
data is in the future:
event = {
"name": "Alice",
"birthday": "2090-01-01",
"email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "error",
"message": [
{
"type": "date_past",
"loc": [
"birthday"
],
"msg": "Date should be in the past",
"input": "2090-01-01"
}
]
}
"""
Upgrade 3: Customizing name
validation
You can also customize the validation logic for a field by defining a custom validator function. For example, we can define a custom validator function to ensure that the name
field contains both a first and last name, and then title case the result.
For example:
from pydantic import BaseModel, field_validator
class UserSignUpEvent(BaseModel):
name: str
birthday: date
email: str
@computed_field
@property
def age(self) -> int:
return (date.today() - self.birthday).days // 365
@field_validator('name')
@classmethod
def name_has_first_and_last(cls, v: str) -> str:
stripped_name = v.strip()
if ' ' not in stripped_name:
raise ValueError('`name` must contain first and last name, got {v}')
return stripped_name.title()
For a valid name
field, we can see that the name is title-cased:
event = {
"name": "alice smith",
"birthday": "1990-01-01",
"email": "[email protected]"
}
context = {"aws_request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"}
print(json.dumps(lambda_handler(event, context), indent=2))
"""
{
"result": "success",
"user": {
"name": "Alice Smith",
"birthday": "1990-01-01",
"email": "[email protected]",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
"""
As you can imagine, if the name
field is missing a last name, the function will raise a descriptive error.
Application: Invoking a Lambda with the AWS CLI
Thus far, we've been invoking the Lambda function directly in Python. In practice, Lambda functions are typically invoked by other services, such as API Gateway, S3, or SNS. The method of invocation will depend on your specific use case and requirements. We'll demonstrate how to invoke the Lambda function using the AWS CLI, which is a common way to test Lambda functions locally.
To invoke this Lambda function with the AWS CLI, you can use the aws lambda invoke
command:
aws lambda invoke \
--function-name my-function \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01", "email": "[email protected]"}' \
output.json
This command assumes that you have the AWS CLI installed and configured with the appropriate credentials. It also assumes that you've configured your Lambda function with the name my-function
. The --payload
option is used to pass the event
data to the Lambda function, and the output of the function will be written to the output.json
file.
If we pass in the valid event
data used above, we see following in the output.json
file:
{
"result": "success",
"user": {
"name": "Alice",
"birthday": "1990-01-01",
"email": "[email protected]",
"age": 34
},
"request_id": "6bc28136-xmpl-4365-b021-0ce6b2e64ab0"
}
Similarly, if we invoke our lambda with an invalid payload, we can expect the output.json
file to be populated with a detailed error response.
Putting it All Together
Here, we can see the concrete benefits of invoking a Lambda function with Pydantic compared to invoking a Lambda function without Pydantic, using the AWS CLI. Consider this invocation:
aws lambda invoke \
--function-name my-lambda \
--cli-binary-format raw-in-base64-out \
--payload '{"name": "Alice", "birthday": "1990-01-01"}'
output.json && cat output.json
Console output:
{
"StatusCode": 200, # (1)!
"FunctionError": "Unhandled",
"ExecutedVersion": "$LATEST"
}
- This 200 status code indicates that the function was invoked successfully. That said, the
FunctionError
field indicates that an unhandled error occurred during the function execution.
Console output:
{
"StatusCode": 200, # (1)!
"ExecutedVersion": "$LATEST"
}
{
"result": "error",
"message": [
{
"type": "missing",
"loc": [
"email"
],
"msg": "Field required",
"input": {
"name": "Alice",
"birthday": "1990-01-01"
}
}
]
}
- This 200 status code indicates that the function was invoked successfully. The response payload contains a detailed error message that explains what went wrong with the input data.
The response from the original Lambda function is unhelpful and doesn't provide any information about what went wrong. In order to debug the issue, you would need to dig into the logs in the AWS management console.
On the other hand, the response from the Lambda function with Pydantic validation is clear and concise. It provides detailed information about the missing email
field in the event
data, making it easy to identify and fix the issue.
Concluding Thoughts
In this article we demonstrated that Pydantic is a powerful tool for structuring and validating event
and context
data in AWS Lambda functions. By utilizing Pydantic, developers can improve the developer experience and runtime performance of their Lambda functions.
We encourage developers to adopt Pydantic as a best practice when developing AWS Lambda functions. Integrating Pydantic into your Lambda functions can be a game-changer, enhancing your code's readability, maintainability, and efficiency.
What's Next?
If you're interested in further exploring the integration capabilities between Pydantic and AWS Lambda, consider the following next steps:
- Use
pydantic-settings
to manage environment variables in your Lambda functions. - Take a deep dive into Pydantic's more advanced features, like custom validation and serialization to transform your Lambda's data.
- Explore creating a Pydantic Lambda Layer to share the Pydantic library across multiple Lambda functions.
- Take a look at more Pydantic custom types, like
NameEmail
,SecretStr
, and many others.