Shipping a RAG pipeline that isn't magic
Retrieval-augmented generation is mostly plumbing. Here's what actually moved quality on a real project.
I shipped a retrieval-augmented generation feature recently, and the biggest lesson was deflating: almost none of the quality came from the model. It came from the boring parts around it.
The model is the last 10%
Everyone obsesses over which model to use. In practice, swapping models moved my quality less than fixing how I chunked documents. The order of impact, roughly:
- Chunking and metadata
- Retrieval (what you actually feed the model)
- The prompt
- The model itself
If retrieval hands the model the wrong context, no model saves you. Garbage in, confident garbage out.
A chunk is a unit of meaning, not a unit of length
My first version split documents every 500 tokens. It was fast and it was useless — half my chunks ended mid-sentence, mid-idea. Switching to structure-aware chunks (split on headings, keep sections whole) did more for answer quality than any prompt change.
// Naive: split on length, shred meaning
const chunks = splitEvery(text, 500);
// Better: split on structure, preserve meaning
const chunks = splitOnHeadings(doc).flatMap((section) =>
section.length > MAX ? splitEvery(section, MAX) : [section]
);
Show your work in the UI
The other thing that mattered wasn't backend at all. Surfacing which source a answer came from — inline, clickable — did more for user trust than any accuracy gain. People forgive a wrong answer they can verify. They don't forgive a confident black box.
RAG isn't magic. It's a search problem wearing a language-model costume. Treat it like one and the quality follows.