Engineering

Engineering

How Kernel is built. Practical notes on RAG, retrieval, and running AI in production.

May 14, 202612 min read

How we cut LLM costs 87% by routing only the generation step to Claude

A practical case for per-stage model selection in production RAG.

April 28, 202610 min read

Why we ship Self-RAG retries, and what happened when we didn't

A measurement bug that lived in our streaming pipeline for months, what users actually saw, and how we fixed it without sacrificing latency.