Skip to content

LLMOps#

Notes on running LLMs in production — reliability, evals, cost, safety.

Mental model#

LLMOps = MLOps + non-determinism + runaway cost risk + prompt-as-config.

The four pillars I track:

  1. Evals — offline + online, regression-gated
  2. Observability — token usage, latency p50/p95/p99, tool-call success rate
  3. Guardrails — input/output filters, PII scrubbing, jailbreak detection
  4. Cost controls — per-tenant budgets, model routing, caching

Pages#

  • Stop Sending All Your Tools to the LLM — use embeddings to filter tools before calling the LLM
  • (coming soon) Evals that actually catch regressions
  • (coming soon) Prompt versioning with Git + feature flags
  • (coming soon) Token-cost budgets in multi-tenant SaaS
  • (coming soon) Model fallback patterns (Gemini → local Ollama)