buildingMAY 2026

Shipping a RAG pipeline that isn't magic

Retrieval-augmented generation is mostly plumbing. Here's what actually moved quality on a real project.

I shipped a retrieval-augmented generation feature recently, and the biggest lesson was deflating: almost none of the quality came from the model. It came from the boring parts around it.

The model is the last 10%

Everyone obsesses over which model to use. In practice, swapping models moved my quality less than fixing how I chunked documents. The order of impact, roughly:

Chunking and metadata
Retrieval (what you actually feed the model)
The prompt
The model itself

If retrieval hands the model the wrong context, no model saves you. Garbage in, confident garbage out.

A chunk is a unit of meaning, not a unit of length

My first version split documents every 500 tokens. It was fast and it was useless — half my chunks ended mid-sentence, mid-idea. Switching to structure-aware chunks (split on headings, keep sections whole) did more for answer quality than any prompt change.

// Naive: split on length, shred meaning
const chunks = splitEvery(text, 500);

// Better: split on structure, preserve meaning
const chunks = splitOnHeadings(doc).flatMap((section) =>
  section.length > MAX ? splitEvery(section, MAX) : [section]
);

Show your work in the UI

The other thing that mattered wasn't backend at all. Surfacing which source a answer came from — inline, clickable — did more for user trust than any accuracy gain. People forgive a wrong answer they can verify. They don't forgive a confident black box.

RAG isn't magic. It's a search problem wearing a language-model costume. Treat it like one and the quality follows.

← All writing