Some Customers Can't Use Your Cloud. Now What?

Some customers cannot use your cloud.

A vendor-operated environment, even an isolated dedicated one, can still be a no-go for regulatory or architectural reasons. If data cannot leave the customer's network boundary, your cloud is not part of the solution.

We announced a self-hosted option for Pydantic Logfire a little over a year ago. We chose Helm because it's the packaging format Kubernetes teams already know how to review, diff, and GitOps. Getting it to run in a customer Kubernetes cluster was the first job. The second job, keeping that install working as the product kept changing, turned out to be much harder, and a lot of that cost showed up as customer support.

This post is about what drove that cost, and what we did to bring it down.

The hard requirement was control

Self-hosting usually stems from hard constraints: strict data residency, BYO encryption, internal identity providers, localized audit trails, or "no call home" behavior.

At runtime, Pydantic Logfire should only call infrastructure the customer explicitly configured. In practice, the customer owns the infrastructure for application state, object storage, and identity. Logfire's own service telemetry stays inside the cluster unless explicitly routed outward.

In a managed cloud, you can hide dependencies behind platform services. In a self-hosted chart, if a service needs network egress, TLS, or external APIs, the chart must expose it clearly and predictably. The boundary has to be explicit, not implied.

Env vars are part of the upgrade path

Logfire ships fast. Startup config changes. Services gain new settings, lose old ones, or change defaults. The problem is that Helm has no way to know: it will render valid YAML for a configuration the application no longer accepts. That gap between "chart is valid" and "deployment actually works" is easy to miss until a customer hits it during an upgrade.

This was one of the first things that started generating support load. We built an audit tool into our release process that diffs the platform configuration against the chart before shipping. It keeps the two in sync as the product evolves.

Sizing and setup

Sizing generated its own category of problems. Kubernetes makes it trivial to expose every workload knob, but forcing users to tune resource requests before their first install pushes product knowledge onto the wrong team.

The underlying issue was that internal settings, worker counts, concurrency limits, were not connected to the actual resource budget the pod had. We were getting support calls from customers hitting OOM kills or CPU throttling with configurations that looked reasonable on paper.

We added sizing presets (e.g. tiny, small, standard) and moved those internal settings to formulas derived from the resources the workload actually gets. If a pod has a specific memory and CPU budget, worker counts and concurrency limits now stay inside that budget automatically. The presets also cover autoscaling and availability defaults, so customers can get a working deployment without having to understand every knob first.

Some complexity is unavoidable: a production install genuinely requires external PostgreSQL, object storage, TLS, and identity providers. We minimized the required inputs for a working deployment, and made it possible to boot a dev-grade instance locally with PostgreSQL and MinIO before committing to the full production setup.

Operators need APIs

In the cloud product, customers don't manage the instance; we do. There's no reason to expose controls over things that only make sense when you're the one running the infrastructure. That changes completely in a self-hosted deployment.

Managing the instance becomes their problem, which means it has to become our API. Building self-hosted forced us to design a new privilege tier, instance-level admin access. The boundary between "things operators manage" and "things the platform manages" had always been implicit. Self-hosted made it explicit.

The honest lesson from a year of this: the build is not the hard part. The ongoing support surface is. Every gap in setup, docs, upgrades, or troubleshooting eventually becomes a call. If we were doing it again, we would treat the chart as part of the product from day one. Not as polish, and not as a documentation pass after the fact. Startup config, sizing, and instance administration are part of the product once customers are the ones operating it.

Some customers can't use your cloud. Now what?

The hard requirement was control

Env vars are part of the upgrade path

Sizing and setup

Operators need APIs

References

Related content

Your traces already know how to fix your prompt

Score Freely with Pydantic Logfire

Explore Logfire

Some customers can't use your cloud. Now what?

#The hard requirement was control

#Env vars are part of the upgrade path

#Sizing and setup

#Operators need APIs

#References

Related content

Your traces already know how to fix your prompt

Score Freely with Pydantic Logfire

Explore Logfire

The hard requirement was control

Env vars are part of the upgrade path

Sizing and setup

Operators need APIs

References