Sophos logo
Cybersecurity

Sophos's SecOps AI team implemented Pydantic Logfire for unified tracing across their AI-powered security solutions. With end-to-end visibility and SQL-based monitoring, engineers now detect issues proactively and run side-by-side LLM experiments with Pydantic Evals.

You can tell that Pydantic Logfire was built by people who use it.
Dennis Griffin, VP of Engineering, Sophos

Products Used:

Pydantic LogfirePydantic Evals
3 mins

Sophos's SecOps AI team builds AI-powered security solutions that protect millions of endpoints globally, including an AI Assistant for their customers. But their monitoring stack was holding them back.

"We'd lose time piecing together what had actually happened," explains Peter Kim, Principal Software Engineer at Sophos. The team needed to trace requests across LLM calls, FastAPI endpoints, and background workers - but their tools showed disconnected fragments instead of complete traces.

On other tooling, dashboard creation was limited, and the team struggled to visualize multiple metrics simultaneously and lacked the flexibility to build the analytical views they needed for monitoring their AI systems. Background jobs would fail silently when Celery workers didn't pick up tasks. "Finding those types of issues with Cloudwatch can be a nightmare", Peter notes.

The team needed unified observability that could keep pace with their AI innovation.

Sophos chose Pydantic Logfire for its OpenTelemetry foundation and developer-first design. Implementation was remarkably straightforward - the team simply toggled on Logfire's integrations for their existing libraries like FastAPI and httpx.

"We can see the whole conversation thread, the LLM call, and every API hop - all in one go," says Peter. "It saves me a ton of time."

The team went beyond basic monitoring, creating SQL-based monitors to detect "missing spans" and catching those previously invisible background job failures instantly. This ensured the team was notified much faster than had been happening prior to Logfire. No custom query language to learn, just SQL via DataFusion.

The team now builds complex multi-metric dashboards, giving them the analytical flexibility they need - “The filtering has been amazing because you can filter for anything,” Peter notes.

Because Sophos operates with highly sensitive customer data, everything is hosted on-prem using Logfire's enterprise self-hosting option.

As confidence grew, Sophos expanded their Logfire usage to include Pydantic Evals for LLM experimentation.

"Evals have been great. We now have the ability to compare experiments.", says Peter. The team particularly values being able to test prompt changes side-by-side and understand performance immediately.

"I think the Logfire UI is cleaner, nicer. Everything's right there. It's what it should be."

— Tony Pelletier, Senior Software Engineer at Sophos

Today, Sophos has achieved what they set out to build:

  • Complete visibility: AI agent runs are traced end-to-end in a single, connected view across all their services
  • Proactive detection: SQL monitors catch issues that previously went unnoticed for hours, with custom alerts for missing spans
  • Rapid experimentation: Side-by-side model evaluations directly in the UI for prompt optimization
  • Team adoption: Engineers praise the interface and actively expand usage - "I'm a big fan of it," says Tony
  • Future-proof architecture: OpenTelemetry foundation means no vendor lock-in - "The great part is it's based on open standards," notes Peter

"This seems more polished to me. And the support we've been getting from the Pydantic team has been awesome - that's a real bonus."

— Peter Kim, Principal Software Engineer at Sophos



Want to achieve similar unified observability for your AI systems? Get started with Pydantic Logfire.