Notes#

Structured, evergreen notes organized by topic. These pages get updated in place as my understanding evolves.

LLMOps

Production AI reliability — evals, guardrails, prompt versioning, token-cost budgets, fallbacks.

Browse notes
AI Observability

LLM metrics in Grafana, OpenTelemetry GenAI semconv, traces across agent hops.

Browse notes
SRE

Multi-cloud Kubernetes, FluxCD, Karpenter, DR drills, incident write-ups.

Browse notes