Agent Infrastructure Engineer

The Opportunity

We're building the infrastructure to run AI agents at scale — safely, reliably, and cheaply. Much of this is greenfield. The goal is a platform that can run untrusted, long-lived, resource-hungry agent workloads — agentic workflows like SRE investigators, issue-fixers, and findings agents that read a codebase, query live observability data, and open well-evidenced GitHub issues and PRs.

We're looking for an engineer who has built and scaled cloud agent infrastructure to help us build it into a core platform capability — for our own agents first, and ultimately for our customers.

What You Will Do

This is greenfield and moving fast: you'll help shape the strategy, not just execute someone else's. The areas below are how we see the work today — they'll evolve as we learn what our agents and our customers need, and we're looking for someone who's energized by that kind of ambiguity and ready to adapt as the picture sharpens.

Design and harden the sandboxing model for running agent and customer code: container/microVM isolation, kernel sandboxes (gVisor), network egress policy, and least-privilege credential minting.
Scale the agent runner: per-run isolation, concurrency and fairness across many simultaneous runs, resource quotas, fast cold-starts, and cost control.
Build durable, resumable agent execution — checkpointing model calls and tool invocations so runs survive pod failures (we use a Postgres-backed durable execution layer).
Own the orchestration path end to end: enqueue → schedule (Kubernetes Batch) → execute → persist results → lifecycle and cleanup.
Integrate the agent runtime with our own observability — every agent run is fully traced in Logfire, and agents query Logfire (via MCP) for live telemetry.
Partner with the Pydantic AI team — our agents are built on Pydantic AI — and help shape what "agents as a product" looks like.

Who You Are

We expect a candidate for this position to have:

Built and operated cloud infrastructure for running agents or untrusted code at scale — e.g. code-execution sandboxes, CI runners, serverless/function platforms, notebook/compute backends, or AI agent platforms.
Strong skills with containers and Kubernetes, and the security boundaries around them (namespaces, cgroups, seccomp, gVisor/Firecracker/Kata, network policy).
Comfort reasoning about multi-tenant isolation, credential scoping, and the blast radius of running someone else's code.
Solid Python (our agent stack) and a willingness to work across the system; you think hard about concurrency, race conditions, and failure recovery in distributed systems.
A genuine interest in AI engineering and agentic systems.
At least 5 years of software engineering experience.

Nice to haves but not required:

Experience with microVM sandboxing (Firecracker, gVisor, Kata) or having run a code-sandbox / agent product (e.g. E2B, Modal, Daytona, Fly, Cloudflare Workers/Sandboxes).
Familiarity with Pydantic AI or other agent frameworks, and the Model Context Protocol (MCP).
Experience with durable / workflow execution engines (Temporal or similar checkpointing systems).
Knowledge of OpenTelemetry and observability.
Rust and/or TypeScript.

Non-Technical Requirements

Live and work in a timezone between PT (UTC-8) and CET (UTC+1)
Able to travel to EU, UK and US up to 4 times a year to join our off-sites
Willing to join our on-call rotation, roughly 1 week in every 10

About Us

Pydantic Validation is the data validation library that powers modern Python development - 500 million downloads per month, used by virtually every tech company you've heard of. Why? Because we obsess over developer experience and write code we'd actually want to use ourselves.

We're applying that same engineering mindset to Pydantic Logfire, our observability platform with first class support for AI engineering, built for today's development reality: AI workloads, multi-language environments, and cloud infrastructure that's designed to be straightforward to set up and maintain.

We build with technologies developers actually want to work with:

OpenTelemetry for standardized instrumentation
SQL for intuitive querying (no proprietary query language to learn)
Rust, Python, and TypeScript for performance and productivity
Postgres, DataFusion, and object storage for scalable backends

Unlike other companies that pay lip service to open source, we commit over 20% of our engineering team to maintaining and expanding our open source ecosystem. This includes the core Pydantic Validation library and Pydantic AI - our rapidly growing framework that's becoming the standard for AI application development. We're signatories of the open source pledge and build on open standards because we believe in interoperability, not lock-in. Use our OpenTelemetry-based SDK with any compatible backend - we're confident you'll choose us on merit.

We're backed by Sequoia Capital and run a fully remote team across multiple time zones (with regular in-person offsites - next one is June 2026 in London).

Join our team of exceptional engineers who value substance over hype, practical approaches over perfectionism, and meaningful progress over busyness. We've built a culture that balances technical ambition with sustainable practices—minimal meetings and respect for your expertise and time. We're creating tools that genuinely improve developers' lives, and we're looking for thoughtful contributors who share our commitment to quality and our passion for elegant solutions.

Perks & Benefits

💰 Compensation: Competitive salary and stock options
🌍 Truly Remote: Work from anywhere within our timezone range - no office requirements
🌐 Global & Diverse: Join a multi-cultural team of 8+ nationalities
💪 Impact: Direct influence on tools used by millions of developers worldwide
🎯 Focus on Growth: Regular opportunities for learning and professional development
🤝 Team Gatherings: Connect with the team at our regular international off-sites
🏥 Healthcare: Comprehensive health coverage for you and your dependents
🎮 Flexible Hours: Work when you're most productive
💻 Equipment: Budget for your home office setup
⚖️ Work-Life Balance: flexible working hours and 33 days PTO no matter where you live (including public holidays, which you can choose to take or not)

Apply

To apply, email careers@pydantic.dev with the job title in the subject line. We'd also appreciate a few lines explaining why you think you'd be a good fit for the role and what you've done in the past that evidences that.

No recruiters or agencies please. Unsolicited recruiters will be marked as spam.

To make your application stand out, please share something you've built and run in production — a sandbox, a runner, an agent platform, or a relevant open source contribution.

Agent Infrastructure Engineer

#The Opportunity

#What You Will Do

#Who You Are

#Non-Technical Requirements

#About Us

#Perks & Benefits

#Apply

Explore Logfire.