AI Observability#
Making LLM workloads measurable with the same rigor as any other distributed system.
The gap#
Traditional APM tools don't know about:
- Token counts (input / output / cached)
- Model-level latency budgets
- Tool-call trees in agents
- Eval scores over time
- Prompt version drift
What I'm building#
- OpenTelemetry GenAI semantic conventions in our Gemini API layer
- Grafana dashboards for per-tenant token spend
- Tempo traces across multi-hop agent calls (OCR → productGPT → retrieval)
- Alerts on eval regression, not just latency
Pages#
- (placeholder) OTel GenAI semconv — the short version
- (placeholder) Grafana dashboards for LLM cost attribution
- (placeholder) Tracing agents: spans that actually help on-call