What is AgentSH and how does it work with Pydantic Logfire?

AgentSH monitors what AI agents actually do at the OS level: file access, network connections, and process execution. It exports these as OpenTelemetry LogRecords via OTLP. Pydantic Logfire ingests those alongside its own traces of model calls and tool invocations, giving you one unified timeline from prompt to process.

Can I use AgentSH and Logfire with Claude Code's YOLO mode?

Yes. YOLO mode skips Claude Code's permission prompts for speed, but AgentSH still enforces runtime policies at the OS boundary and audits everything. Logfire gives you the full session trace. Together they make bypass mode survivable: you keep the velocity while maintaining visibility and policy enforcement.

Do I need an OpenTelemetry Collector to use this setup?

Not strictly. AgentSH can export directly to Logfire via OTLP. However, for production use, placing an OpenTelemetry Collector in between lets you add environment metadata, filter sensitive attributes, route to multiple destinations, and standardize authentication.

Full-Stack Agent Observability with AgentSH + Pydantic Logfire

Q: How does trace correlation work between AgentSH and Logfire?

When your agent creates a Logfire span for a tool call, it pushes the current trace_id and span_id to the AgentSH session. AgentSH attaches that context to every FUSE event it emits. Both streams arrive in Logfire under the same trace, so a blocked file read appears directly under the tool call span that triggered it.

This is a guest post by Canyon Road (AgentSH).

TL;DR:

Pydantic Logfire traces what your agent thought (model calls, tool invocations, latency).
AgentSH audits what it actually did on the machine (file access, network connections, process execution).
Both speak OpenTelemetry, so they land in one timeline. You get end-to-end visibility from prompt to process, with policy enforcement at the OS boundary.

The hardest agent failures are not in the LLM call. They are in the seams. The subprocess that ran uv add. The script that touched a credential file. The unexpected outbound connection hiding inside a dependency. The cleanup step that became rm on the wrong directory.

Pydantic Logfire exists for exactly this reality: unified observability across AI and the rest of your application stack, built on OpenTelemetry, so you can see a single timeline instead of jumping between tools.

But there is one blind spot almost every team hits with agents. You can trace what the agent thought and which tools it called, but you still cannot reliably answer: what did it actually do on the machine?

That is what AgentSH is for.

AgentSH sits under the agent at the execution boundary. It records file, network, and process activity, applies policy decisions, and exports those audit events as OpenTelemetry LogRecords via OTLP.

Put them together and you get something rare in agentic systems: one end-to-end story, from prompt to process.

Two kinds of observability, one shared language

Most "AI observability" stops at the app layer: model calls, tokens, latency, tool invocations and responses, traces across your backend, DB, queues, and services.

Logfire makes that whole picture visible across AI and general application behavior using unified traces, logs, and metrics via OpenTelemetry.

But what actually breaks you in production is often execution:

A command that spawned nested subprocesses
A dependency install via uv add that reached out to an unexpected domain
A script that read ~/.aws/credentials
A cleanup that wrote outside the workspace

AgentSH turns those into structured audit events and makes the decision visible too: allow, approve, redirect, deny. Those decisions map to severity in the exported logs (deny becomes ERROR), which means your existing dashboards and alerting work for agent runtime behavior.

Both systems speak OpenTelemetry. They can meet in the same pipeline.

YOLO mode: the velocity everyone wants, and the regret everyone fears

Let's talk about the mode everyone uses and nobody wants to admit they use.

Claude Code has a bypass mode that skips permission checks. The docs are blunt: it disables permission checks and should only be used in isolated environments like containers or VMs.

That warning is correct. YOLO mode is never safe.

But teams still turn it on for a reason: prompt-by-prompt supervision does not scale when agents iterate at machine speed. So the real question is not whether anyone should ever use it. It is this: if you need that velocity, can you make the trade survivable?

This is where AgentSH + Logfire is a useful combination.

Logfire gives you the story of the session: traces, model calls, tool calls. AgentSH gives you the runtime truth: what happened at the OS boundary, what was blocked, what was allowed, and what it tried to do anyway.

You keep the speed. You stop flying blind.

What the combined system looks like

Think of it as two streams that land in one place.

Stream 1: what the agent did in the harness. Pydantic maintains a Logfire plugin for Claude Code that turns each session into a trace with child spans per LLM API call, including token usage, cost tracking, and conversation history.

Stream 2: what the agent did on the machine. AgentSH records execution events (files, network, process operations) and exports them as OpenTelemetry LogRecords via OTLP, asynchronously and without blocking if export fails.

Both streams can land in Logfire because Logfire supports standard OTLP ingest over HTTP, configured via OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS. See the Logfire OTLP ingest guide.

At that point, Logfire becomes the place you answer questions like:

Why did this agent run so long?
What changed in the workspace?
What did it try to execute?
Did it attempt outbound network?
Which policies denied it, and how often?
Which sessions are doing risky things?

Correlation is the point

When people say "end-to-end visibility," they often mean "two dashboards." The goal here is tighter: one investigation thread.

AgentSH supports trace correlation. If an event carries trace_id and span_id, those IDs attach to the exported log record so it lines up with distributed traces. You can align the runtime audit trail with the same trace as your agent session. If you have ever tried to debug an agent incident from partial logs and terminal output, you know how significant that is.

The production pattern: collector in the middle

Most teams do not want every tool pushing directly to every backend. They want one place to add environment metadata, filter or scrub sensitive attributes, route to multiple destinations, and standardise authentication.

That is exactly what the OpenTelemetry Collector is for. Logfire documents using it for data transformation, enrichment, and collecting existing system logs. AgentSH is designed for the same reality: export audit events via OTLP to any OTel-compatible collector.

The clean architecture is:

AgentSH → OpenTelemetry Collector → Logfire

And optionally, other sinks too.

What this enables in practice

Denied becomes a real signal, not a vibe. AgentSH maps policy decisions to severity. So you can alert on spikes in denied outbound connections, attempts to read disallowed paths, and risky command patterns appearing in nested scripts.

You can finally answer: what did it do? With AgentSH events in Logfire, the same place you inspect tool calls is the place you inspect file reads and writes, exec activity and subprocess behaviour, and network attempts, whether allowed or blocked.

YOLO becomes less dangerous, not safe. Claude Code's guidance is to only use bypass mode in isolated environments. AgentSH does not erase that warning. It makes it practical to follow it: run in a contained environment, enforce runtime policy at the boundary, ship the full audit stream to Logfire for visibility and incident response. That combination is the difference between "we hope nothing bad happens" and "we can prove what happened."

See it working: the demo

The best way to understand the correlation is to see it. We built a working demo that puts all of this together in a single container.

A Pydantic AI agent is given two tools, list_files and cat_file, and asked to read every file in its working directory. One of those files is .env. AgentSH blocks the read at the FUSE filesystem level before the data ever reaches the agent. The agent gets EACCES, moves on, and the whole thing shows up in Logfire as a single correlated trace:

agent-run
  agent run
    chat claude-sonnet-4-6
    running tool: list_files
      list_files .
        dir_list: /workspace [allow]          ← agentsh FUSE event
    chat claude-sonnet-4-6
    running tool: cat_file
      cat_file .env
        file_open: /workspace/.env [deny]     ← blocked, level=ERROR
    chat claude-sonnet-4-6
    running tool: cat_file
      cat_file sample.txt
        file_open: /workspace/sample.txt [allow]
        file_read: /workspace/sample.txt [allow]

The blocked .env access appears as ERROR under the exact cat_file span that triggered it. Not in a separate dashboard. Not in a separate log stream. Right there in the trace.

Trace correlation works like this: the Python agent creates a Logfire span for each tool call, pushes the current trace_id and span_id to the AgentSH session before executing the tool, and AgentSH attaches that context to every FUSE event it emits. Both streams arrive in Logfire under the same trace.

To run it yourself you need Docker with FUSE support, an Anthropic API key, and optionally a Logfire token (without one, traces print to console):

docker build -t pydantic-demo .

docker run --rm \
  --cap-add SYS_ADMIN \
  --device /dev/fuse \
  --security-opt apparmor=unconfined \
  --env-file .env \
  pydantic-demo

The --cap-add SYS_ADMIN and --device /dev/fuse flags are required for AgentSH's FUSE interception to work inside the container.

The summary

Logfire gives you the trace of your AI system. AgentSH gives you the audit trail. OpenTelemetry makes it one story.

If you are going to run agents at speed, you owe yourself runtime truth. Visibility that is queryable, alertable, and explainable.

From Prompts to Processes: Full-Stack Agent Visibility with AgentSH + Pydantic Logfire

Two kinds of observability, one shared language

YOLO mode: the velocity everyone wants, and the regret everyone fears

What the combined system looks like

Correlation is the point

The production pattern: collector in the middle

What this enables in practice

See it working: the demo

The summary

Resources

Related content

Observability tools agents want

Some customers can't use your cloud. Now what?

Explore Logfire

From Prompts to Processes: Full-Stack Agent Visibility with AgentSH + Pydantic Logfire

#Two kinds of observability, one shared language

#YOLO mode: the velocity everyone wants, and the regret everyone fears

#What the combined system looks like

#Correlation is the point

#The production pattern: collector in the middle

#What this enables in practice

#See it working: the demo

#The summary

#Resources

Related content

Observability tools agents want

Some customers can't use your cloud. Now what?

Explore Logfire

Two kinds of observability, one shared language

YOLO mode: the velocity everyone wants, and the regret everyone fears

What the combined system looks like

Correlation is the point

The production pattern: collector in the middle

What this enables in practice

See it working: the demo

The summary

Resources