General Intelligence Company (GIC) of New York migrated to Logfire, Pydantic’s AI Observability Platform and Pydantic AI to build a live evaluation system for their autonomous agents. The results? Query performance improved 150x, eliminating rate limits and enabling real-time deviation detection and agent self-correction that was impossible before.
Results at a Glance
- 150x faster queries: Trace retrieval dropped from 135 seconds to under 1 second, eliminating rate limits entirely
- Real-time agent evaluation: Live eval system detects when agents get stuck or deviate, enabling immediate intervention
- Agents that debug themselves: Logfire became a tool agents use to query their own traces and self-correct
- Stack: Pydantic Logfire (Observability), Pydantic AI (Agent Framework).
The goal: Frictionless autonomous agent observability
General Intelligence Company builds autonomous agents that run businesses. One of their products is Cofounder capable of managing features, working through Linear backlogs and overseeing infrastructure without human intervention.
The architecture relies on orchestrating multiple specialized sub-agents: coding agents, QA agents, and others working in concert. At this level of complexity, knowing when an agent deviates from expected behavior becomes essential. Without visibility, autonomous systems drift.
The challenge: Observability that couldn’t keep up with autonomous agents
Running multi-agent orchestration at production scale requires more than logging: it requires live evaluation. General Intelligence Company needed to track agent behavior in real-time, detect when things went wrong, and ideally correct course before users noticed.
- Nested Data Models: Their previous observability platform created friction at every step. The data model stored traces in deeply nested structures.
- The N+1 Problem: To analyze a run, engineers had to fetch parent runs and child runs in separate API calls, then flatten everything manually.
- Depleted Rate Limits: Each call consumed rate budget. Complex agent traces with deep nesting exhausted rate limits before queries could complete. Even simple debugging workflows felt slow. Even after applying every recommended optimization including parallelization, the team consistently hit rate limits.
"Before, we'd have to get not just the parent runs but the child runs that were nested, and flatten them. That latency made live evals impossible," explains Spencer, Founding Researcher at General Intelligence Company.
The solution: Pydantic Logfire for real-time agent evaluation
The GIC team initially considered building their own observability platform and agent framework. After testing Pydantic AI and Pydantic Logfire, they reconsidered. Due to Logfire's architecture, the GIC team immediately saw performance improvements that changed what was possible with Cofounder.
"The folks at Pydantic are extremely smart. As developers ourselves, we knew they would have solved the same challenges we're thinking about like direct SQL access for Logfire or a robust SDK.", says Spencer Hong, GIC's research founder.
A 150x performance gain that unlocked new capabilities
The team ran systematic benchmarks comparing their previous observability platform against Logfire. The results were dramatic.
The team ran systematic benchmarks comparing their previous observability platform against Logfire. The results were dramatic. The benchmark tested query latency across message count percentiles, from p50 (simpler traces of agents that ran for a few minutes) to p99 (complex traces beyond 90 minutes).
- Before Logfire, query latency hovered between 114 and 145 seconds regardless of complexity, meaning even simple trace lookups took over two minutes.
- After migrating to Logfire, latency dropped to sub-second response times across the board: 808ms at p50, 891ms at p90, and 960ms at p99. The improvement factor ranged from 141 to 161 times, with Logfire maintaining consistent performance even as trace complexity increased. The flat latency curve demonstrates that Logfire's architecture scales gracefully, i.e. complex agent traces with dozens of messages don't create the query bottlenecks that affected the team's previous setup.

"We migrated from LangSmith to Logfire and the time it took to query our agent traces went down by 96.2%."
— Andrew Pignanelli, Founder and CEO, The General Intelligence Company of New York
Pydantic Logfire’s SQL-First Approach
The root cause for the sluggish performance was architectural. With Logfire, the same information comes back in a single SQL query. No nested fetches. No rate limit issues. No flattening logic. The team puts their filtering and grouping logic directly into SQL, and Logfire handles the rest.
"That's how we were able to do any logic. The past latency made us want to migrate off because we want fast, live evals where as soon as the trace is created we want to be able to track the agent behavior. Since the migration, we've been handling hour-long autonomous tasks, partly powered by the agent's ability to debug itself in real time."
This resulted in performance improvements across all query complexity levels. Whether fetching simple traces or analyzing complex multistep agent runs, the response times remained fast and consistent.
The Breakthrough: Agents that debug themselves
The performance leap enabled something new: agents that evaluate and correct themselves. Because Logfire queries are sub-second, GIC’s agents can now query their own history during execution.
This escalation system only became possible after migrating to Pydantic Logfire. The previous solution was too slow to support real-time intervention.
General Intelligence Company took this further. Logfire itself became a tool their agents can use. When an agent gets stuck, it queries its own session history, examines past violations, and adjusts behavior accordingly.
"I've seen agents get stuck, query their own traces through Pydantic Logfire, look at violations, and use that to inform their next move."
— Abhishyant Khare, co-founder and CTO at The General Intelligence Company of New York.
Live self-correction: The capability that changed everything
The performance leap enabled something new: agents that evaluate and correct themselves.
"We have a fast eval system that detects if the agent is stuck or idling. That escalates to a full trace evaluation of where things could have gone wrong. Was the environment set up to fail? How do we attribute fault?"
This escalation system only became possible after migrating to Pydantic Logfire. The previous solution was too slow to support real-time intervention.
General Intelligence Company took this further. Logfire itself became a tool their agents can use. When an agent gets stuck, it queries its own session history, examines past violations, and adjusts behavior accordingly.
Three ways Logfire powers the GIC agent infrastructure
-
Day-to-day engineering: Engineers query traces directly from the UI using SQL. No custom query language, no waiting. Issues that took hours to debug now take minutes.
-
Real-time agent monitoring: A sidecar process watches each agent run, querying Logfire as the agent executes. When behavior deviates from the expected path, the system flags it immediately.
-
Scheduled evaluations: Cron jobs pull traces matching specific criteria: interesting failures, frustrated users, fault patterns. This data shapes engineering priorities and surfaces systemic issues before they compound.
Pydantic AI: A strongly typed foundation for building agents
Beyond observability, General Intelligence Company uses Pydantic AI as their core agent framework. They leverage the lower-level components rather than the high-level Agent class, giving them precise control over streaming behavior and model responses.
"We use the model stream adapters, the model response classes. We needed custom behavior when streaming from LLMs to orchestrate agents, so we couldn't use anything higher level."
This approach delivers automatic OpenTelemetry integration. Every model call, every streamed response flows into Logfire without additional instrumentation. Type safety at the model boundary catches errors before they propagate.
Pydantic Validation runs throughout their codebase, ensuring data consistency from API boundaries to internal logic.
The results: Performance that enables autonomy
General Intelligence Company now operates with confidence their agents will behave as expected, or quickly correct when they don't:
- 150x faster queries: Complex trace analysis dropped from 123 seconds to under 1 second
- Single-query data access: SQL-based queries eliminate the N+1 problem of nested data fetching
- Live evaluation at scale: Real-time deviation detection and self-correction, only possible with sub-second query performance
- Zero rate limit friction: No more throttling, no more flattening logic, no more workarounds
- Agent self-debugging: Agents query their own traces through Logfire to examine and adjust behavior
- Type-safe orchestration: Pydantic AI provides validated model outputs with automatic observability
"In the past, we were hitting rate limit issues just because we were getting all the child runs. With Logfire, with one API query, you can get whatever you want. You just put your logic in the SQL."
— Spencer, Founding Researcher at General Intelligence Company
Building autonomous agents that need real-time observability? Get started with Pydantic Logfire.
Frequently Asked Questions
Why did General Intelligence Company switch from LangSmith to Pydantic Logfire? They switched primarily for query performance and data accessibility. LangSmith’s nested data structure required multiple API calls to fetch child runs, which often resulted in rate limits being reached. Logfire’s SQL-based approach resolved this issue.
How much faster is Pydantic Logfire compared to the previous solution? In GIC's benchmarks, Logfire was approximately 150x faster. Before Logfire, query latency was around 135 seconds. After migrating, latency dropped to sub-second times across all percentiles: 808ms at p50, 891ms at p90, and 960ms at p99. The improvement factor ranged from 141x to 161x depending on trace complexity.
What is "Real-Time Agent Self-Correction"? This is an advanced capability where an AI agent can query its own execution logs (traces) to understand why it failed or got stuck. It then uses that information to adjust its behavior without human intervention. This is only possible with sub-second query latency.
Can I use SQL to query my LLM traces in Logfire? Yes. Logfire allows you to use SQL to filter, group, and retrieve traces. This eliminates the need to fetch massive JSON objects and flatten them manually in your code, which is a common bottleneck in other platforms.
Does Pydantic Logfire support OpenTelemetry? Yes. Pydantic Logfire is built on OpenTelemetry standards. If you use Pydantic AI, all model calls and streams are automatically instrumented and sent to Logfire without extra setup.
Why is low latency important for AI evaluations (Evals)? If evaluations take minutes to run (due to slow queries), they can only be done "offline" (after the fact). If queries are sub-second, evaluations can run "live" (while the agent is working), allowing the system to intervene and fix errors before the user sees them.
What kind of agents does General Intelligence Company build? They build autonomous agents, including their flagship product Cofounder (cofounder.co). These are complex multi-agent systems capable of handling hour-long autonomous tasks end-to-end, such as managing software backlogs via Linear, coding features, and handling infrastructure without human intervention.
What is Pydantic Logfire? Pydantic Logfire is an AI observability platform built on OpenTelemetry standards. It provides SQL-based querying of LLM traces, sub-second query latency, and automatic instrumentation for Pydantic AI model calls. It is designed for developers building and monitoring AI agents at production scale.
What is the N+1 problem in AI agent observability? The N+1 problem in AI agent observability occurs when trace data is stored in deeply nested structures, requiring one API call to fetch the parent run and additional calls for each child run. This leads to excessive API requests, rate limit exhaustion, and high latency. Pydantic Logfire solves this with a flat, SQL-queryable data model that returns all trace data in a single query.
How does Pydantic Logfire compare to LangSmith for AI agent tracing? General Intelligence Company migrated from LangSmith to Pydantic Logfire and measured a 150x improvement in query performance. LangSmith's nested data model required multiple API calls and manual flattening, often hitting rate limits. Logfire's SQL-first architecture provides single-query access to all trace data with sub-second latency, enabling real-time evaluations and agent self-correction that were not possible before.
What is Cofounder by General Intelligence Company? Cofounder is a product by The General Intelligence Company of New York (GIC). It is an autonomous AI agent capable of managing features, working through Linear backlogs, and overseeing infrastructure. Cofounder handles hour-long autonomous tasks without human intervention, powered by Pydantic AI and monitored through Pydantic Logfire.