Notes#
Structured, evergreen notes organized by topic. These pages get updated in place as my understanding evolves.
-
LLMOps
Production AI reliability — evals, guardrails, prompt versioning, token-cost budgets, fallbacks.
-
AI Observability
LLM metrics in Grafana, OpenTelemetry GenAI semconv, traces across agent hops.
-
SRE
Multi-cloud Kubernetes, FluxCD, Karpenter, DR drills, incident write-ups.