/Pydantic Logfire

Full-stack Kubernetes observability with Logfire

12 mins

Most Kubernetes observability setups split into two worlds: cluster metrics in one tool, application traces in another. You end up switching tabs to correlate a pod OOM kill with the request that caused it.

This guide wires both layers into Logfire through a single OpenTelemetry Collector deployment. Cluster metrics from kube-state-metrics and kubelet cAdvisor flow alongside your application traces, linked by shared Kubernetes attributes.

  • The Problem: Cluster-level metrics (pod restarts, container memory, resource limits) and application-level traces live in separate systems. Correlating them during an incident takes too long.
  • The Solution: Deploy an OpenTelemetry Collector as a DaemonSet that scrapes Prometheus metrics from kube-state-metrics and kubelet cAdvisor, enriches everything with Kubernetes metadata via the k8sattributes processor, and exports to Logfire.
  • What You Get: A single place where you can see a pod's memory usage climbing, the OOM kill event, and the exact request trace that triggered it.
  • Prerequisites: A Kubernetes cluster, Helm, a Logfire project with a write token.

The setup has three components:

  1. kube-state-metrics — exposes cluster object state (pod phase, container restarts, resource limits/requests) as Prometheus metrics.
  2. kubelet cAdvisor — exposes container-level runtime metrics (memory working set, CPU usage, network I/O) via each node's kubelet. Already built into every Kubernetes node.
  3. OpenTelemetry Collector (DaemonSet) — scrapes both metric sources, enriches with Kubernetes attributes, and exports to Logfire via OTLP.
┌─────────────────┐  ┌──────────────────┐
│ kube-state-     │  │ kubelet cAdvisor │
│ metrics         │  │ (every node)     │
└────────┬────────┘  └────────┬─────────┘
         │ :8080/metrics      │ :10250/metrics/cadvisor
         ▼                    ▼
┌────────────────────────────────────────┐
│   OTel Collector (DaemonSet)           │
│   ┌────────────┐  ┌───────────────┐	 │
│   │ prometheus │→ │ k8sattributes │    │
│   │ receivers  │  │ processor     │    │
│   └────────────┘  └───────┬───────┘    │
│                           ▼            │
│                   ┌──────────────┐     │
│                   │ otlphttp/    │     │
│                   │ logfire      │     │
│                   └──────────────┘     │
└────────────────────────────────────────┘
         │
         ▼
┌────────────────────────────────────────┐
│              Logfire                   │
└────────────────────────────────────────┘

Your instrumented applications send traces through the same Collector. The k8sattributes processor tags everything with k8s.namespace.name, k8s.pod.name, k8s.deployment.name, and k8s.node.name — making it possible to join metrics and traces by pod or deployment. Applications that send traces directly to Logfire bypass the Collector and won't get this Kubernetes metadata.


cAdvisor is already built into the kubelet on every Kubernetes node — no installation needed. You only need to deploy kube-state-metrics:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-state-metrics prometheus-community/kube-state-metrics \
  --namespace monitoring --create-namespace

Verify it's running:

kubectl -n monitoring get pods

Create a ConfigMap with the Collector configuration. This is where the work happens.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitoring
data:
  config.yaml: |
    receivers:
      # Scrape kube-state-metrics
      prometheus/ksm:
        config:
          scrape_configs:
            - job_name: 'kube-state-metrics'
              scrape_interval: 30s
              kubernetes_sd_configs:
                - role: endpoints
                  namespaces:
                    names: [monitoring]
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_name]
                  action: keep
                  regex: .*kube-state-metrics.*

      # Scrape kubelet cAdvisor for container-level metrics
      prometheus/cadvisor:
        config:
          scrape_configs:
            - job_name: 'kubelet-cadvisor'
              scrape_interval: 30s
              scheme: https
              tls_config:
                insecure_skip_verify: true
              bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
              kubernetes_sd_configs:
                - role: node
              relabel_configs:
                - action: labelmap
                  regex: __meta_kubernetes_node_label_(.+)
                - target_label: __metrics_path__
                  replacement: /metrics/cadvisor

      # Receive OTLP from instrumented applications
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 10s
        send_batch_size: 1024

      # Enrich with Kubernetes metadata
      k8sattributes:
        auth_type: "serviceAccount"
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.pod.name
            - k8s.pod.uid
            - k8s.deployment.name
            - k8s.node.name
            - k8s.container.name
        pod_association:
          - sources:
              - from: resource_attribute
                name: k8s.pod.ip
          - sources:
              - from: connection

      # Drop high-cardinality metrics you don't need
      filter/metrics:
        metrics:
          exclude:
            match_type: regexp
            metric_names:
              - kube_.*_labels
              - kube_.*_annotations

      memory_limiter:
        check_interval: 5s
        limit_mib: 512
        spike_limit_mib: 128

    exporters:
      otlphttp/logfire:
        endpoint: https://logfire-us.pydantic.dev
        headers:
          Authorization: "Bearer ${LOGFIRE_TOKEN}"
        compression: gzip

    service:
      pipelines:
        metrics:
          receivers: [prometheus/ksm, prometheus/cadvisor]
          processors: [k8sattributes, filter/metrics, batch, memory_limiter]
          exporters: [otlphttp/logfire]
        traces:
          receivers: [otlp]
          processors: [k8sattributes, batch, memory_limiter]
          exporters: [otlphttp/logfire]
        logs:
          receivers: [otlp]
          processors: [k8sattributes, batch, memory_limiter]
          exporters: [otlphttp/logfire]

A few things to note:

  • Two Prometheus receivers with separate scrape configs keep kube-state-metrics and cAdvisor isolated. Easier to debug when one breaks.
  • cAdvisor is scraped via the kubelet's HTTPS endpoint using the ServiceAccount token for authentication. insecure_skip_verify is needed because kubelets use self-signed certs.
  • k8sattributes enriches both metrics and traces. This is what makes cross-signal correlation possible later.
  • filter/metrics drops kube_*_labels and kube_*_annotations metrics — these are high-cardinality and rarely useful for dashboards.
  • memory_limiter prevents the Collector from getting OOM-killed itself.

The DaemonSet ensures one Collector instance per node, which is required for cAdvisor scraping.

apiVersion: v1
kind: Secret
metadata:
  name: logfire-credentials
  namespace: monitoring
type: Opaque
stringData:
  token: "YOUR_LOGFIRE_WRITE_TOKEN"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      serviceAccountName: otel-collector
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:0.148.0
          args: ["--config=/etc/otel/config.yaml"]
          env:
            - name: LOGFIRE_TOKEN
              valueFrom:
                secretKeyRef:
                  name: logfire-credentials
                  key: token
          ports:
            - containerPort: 4317
              hostPort: 4317    # OTLP gRPC
            - containerPort: 4318
              hostPort: 4318    # OTLP HTTP
          volumeMounts:
            - name: config
              mountPath: /etc/otel
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
      volumes:
        - name: config
          configMap:
            name: otel-collector-config

The k8sattributes processor and Prometheus receivers need read access to the Kubernetes API. Create a ServiceAccount with the appropriate ClusterRole:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-collector
rules:
  - apiGroups: [""]
    resources: ["pods", "namespaces", "nodes", "endpoints", "services", "nodes/metrics", "nodes/proxy"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "deployments"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-collector
subjects:
  - kind: ServiceAccount
    name: otel-collector
    namespace: monitoring
roleRef:
  kind: ClusterRole
  name: otel-collector
  apiGroup: rbac.authorization.k8s.io

Apply everything:

kubectl apply -f rbac.yaml
kubectl apply -f configmap.yaml
kubectl apply -f daemonset.yaml

Instead of sending traces directly to Logfire, point your application's OTLP exporter at the local Collector. Since the Collector runs as a DaemonSet with hostPort, you can use the node's host IP:

# In your application deployment
env:
  - name: NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP
  - name: POD_IP
    valueFrom:
      fieldRef:
        fieldPath: status.podIP
  - name: OTEL_EXPORTER_OTLP_ENDPOINT
    value: "http://$(NODE_IP):4318"
  - name: OTEL_RESOURCE_ATTRIBUTES
    value: "k8s.pod.ip=$(POD_IP)"

The OTEL_RESOURCE_ATTRIBUTES line is important: hostPort uses SNAT, so the Collector sees the node IP instead of the pod IP for incoming connections. Setting k8s.pod.ip explicitly lets the k8sattributes processor match traces to the correct pod.

For a Python application using Logfire:

import logfire

# send_to_logfire=False tells the SDK to use the standard
# OTEL_EXPORTER_OTLP_ENDPOINT env var instead of Logfire directly
logfire.configure(send_to_logfire=False)

Your application traces now flow through the same Collector that handles cluster metrics. The k8sattributes processor enriches both with the same Kubernetes metadata, which means you can filter and join by pod, deployment, or namespace in Logfire.


Once data flows in, here are some practical queries you can run in Logfire's SQL explorer.

SELECT
  attributes->>'deployment' AS deployment,
  attributes->>'namespace' AS namespace,
  max(scalar_value) AS total_restarts
FROM metrics
WHERE metric_name = 'kube_pod_container_status_restarts_total'
  AND recorded_timestamp > now() - INTERVAL '1 hour'
GROUP BY deployment, namespace
ORDER BY total_restarts DESC
SELECT
  m.attributes->>'pod' AS pod,
  max(m.scalar_value) AS memory_bytes,
  avg(s.duration) AS avg_request_duration_ms
FROM metrics m
JOIN records s
  ON m.attributes->>'pod' = s.otel_resource_attributes->>'k8s.pod.name'
WHERE m.metric_name = 'container_memory_working_set_bytes'
  AND s.span_name LIKE 'GET %'
  AND m.recorded_timestamp > now() - INTERVAL '30 minutes'
  AND s.start_timestamp > now() - INTERVAL '30 minutes'
GROUP BY pod
ORDER BY memory_bytes DESC
SELECT
  attributes->>'name' AS container,
  attributes->>'pod' AS pod,
  attributes->>'namespace' AS namespace,
  max(scalar_value) AS cpu_seconds
FROM metrics
WHERE metric_name = 'container_cpu_usage_seconds_total'
  AND recorded_timestamp > now() - INTERVAL '15 minutes'
  AND attributes->>'container' != ''
GROUP BY container, pod, namespace
ORDER BY cpu_seconds DESC

These queries can be saved as dashboard panels. Create a "Kubernetes Overview" dashboard combining cluster health metrics with application performance — no tab switching required.


Cluster metrics can get noisy. A few things that help:

Filter at the Collector. The filter/metrics processor in the config above already drops label and annotation metrics. Extend it for anything you don't need:

filter/metrics:
  metrics:
    exclude:
      match_type: regexp
      metric_names:
        - kube_.*_labels
        - kube_.*_annotations
        - machine_.*

Increase scrape intervals. 30s is a reasonable default. For metrics that change slowly (pod resource limits), 60s or even 120s is fine.

Use the transform processor to drop attributes you don't query on, reducing cardinality:

transform/reduce-cardinality:
  metric_statements:
    - context: datapoint
      statements:
        - delete_key(attributes, "uid")
        - delete_key(attributes, "instance")

  1. Check the Collector logs: kubectl -n monitoring logs -l app=otel-collector
  2. Look for scrape errors — usually means kube-state-metrics isn't reachable or the kubelet token is missing.
  3. Verify the Logfire token: a 401 in the exporter logs means your token is wrong.
  1. Verify the ServiceAccount has the right RBAC permissions.
  2. Check for k8sattributes errors in the Collector logs — "cannot list pods" means RBAC is missing.
  3. Make sure your app sets OTEL_RESOURCE_ATTRIBUTES=k8s.pod.ip=$(POD_IP) — without this, the processor can't match traces to pods when using hostPort.

The memory_limiter processor should prevent this. If it still happens:

  • Increase the DaemonSet memory limit.
  • Reduce send_batch_size to flush more frequently.
  • Add more aggressive metric filtering.

Ready to try Logfire to trace your full-stack k8s setup? Create a project and grab a write token.


Can I use a Deployment instead of a DaemonSet?

Yes, but you lose node-local cAdvisor scraping and need to configure kubelet access differently. A DaemonSet is the standard pattern for this reason.


What about managed Kubernetes (GKE, EKS, AKS)?

The setup is the same. For GKE and EKS, you may also want to add cloud-provider metrics using the cloud metrics guide alongside this setup.


How does this relate to the Logfire system metrics SDK?

logfire.instrument_system_metrics() collects process-level metrics from within your Python application. This guide collects cluster-level metrics from outside your application. They complement each other — use both if you want full coverage.


Can I send metrics to Logfire and Prometheus simultaneously?

Yes. Add a Prometheus remote-write exporter to the Collector's metrics pipeline alongside the Logfire exporter. See the OTel Collector guide for multi-backend configuration.