LLMOps#
Notes on running LLMs in production — reliability, evals, cost, safety.
Mental model#
LLMOps = MLOps + non-determinism + runaway cost risk + prompt-as-config.
The four pillars I track:
- Evals — offline + online, regression-gated
- Observability — token usage, latency p50/p95/p99, tool-call success rate
- Guardrails — input/output filters, PII scrubbing, jailbreak detection
- Cost controls — per-tenant budgets, model routing, caching
Pages#
- Stop Sending All Your Tools to the LLM — use embeddings to filter tools before calling the LLM
- (coming soon) Evals that actually catch regressions
- (coming soon) Prompt versioning with Git + feature flags
- (coming soon) Token-cost budgets in multi-tenant SaaS
- (coming soon) Model fallback patterns (Gemini → local Ollama)