Skip to content

AI Observability#

Making LLM workloads measurable with the same rigor as any other distributed system.

The gap#

Traditional APM tools don't know about:

  • Token counts (input / output / cached)
  • Model-level latency budgets
  • Tool-call trees in agents
  • Eval scores over time
  • Prompt version drift

What I'm building#

  • OpenTelemetry GenAI semantic conventions in our Gemini API layer
  • Grafana dashboards for per-tenant token spend
  • Tempo traces across multi-hop agent calls (OCR → productGPT → retrieval)
  • Alerts on eval regression, not just latency

Pages#

  • (placeholder) OTel GenAI semconv — the short version
  • (placeholder) Grafana dashboards for LLM cost attribution
  • (placeholder) Tracing agents: spans that actually help on-call