Skip to content

Raju Ghosh — Notes

LLMOps

rajughoshdevai/notes

LLMOps#

Notes on running LLMs in production — reliability, evals, cost, safety.

Mental model#

LLMOps = MLOps + non-determinism + runaway cost risk + prompt-as-config.

The four pillars I track:

Evals — offline + online, regression-gated
Observability — token usage, latency p50/p95/p99, tool-call success rate
Guardrails — input/output filters, PII scrubbing, jailbreak detection
Cost controls — per-tenant budgets, model routing, caching

Pages#

Stop Sending All Your Tools to the LLM — use embeddings to filter tools before calling the LLM
(coming soon) Evals that actually catch regressions
(coming soon) Prompt versioning with Git + feature flags
(coming soon) Token-cost budgets in multi-tenant SaaS
(coming soon) Model fallback patterns (Gemini → local Ollama)