TemplateFREE⏱️ 15 minutes
RAG Architecture Template for AI Products
A template for designing Retrieval-Augmented Generation systems, covering retrieval pipeline design, context assembly, generation configuration,...
Updated 2026-03-05
RAG Architecture
| # | Item | Category | Priority | Owner | Status | Notes | |
|---|---|---|---|---|---|---|---|
| 1 | |||||||
| 2 | |||||||
| 3 | |||||||
| 4 | |||||||
| 5 |
#1
#2
#3
#4
#5
Edit the values above to try it with your own data. Your changes are saved locally.
Get this template
Choose your preferred format. Google Sheets and Notion are free, no account needed.
Frequently Asked Questions
When should I use RAG vs fine-tuning?+
Use RAG when your knowledge base changes frequently (documents are updated weekly or daily), when you need source citations for trust, or when you want to keep using the latest base model without retraining. Use fine-tuning when you need the model to learn a specific output format, tone, or reasoning pattern that prompting cannot enforce. Many production systems combine both: RAG for knowledge retrieval and fine-tuning for output quality. The [AI PM Handbook](/ai-guide) covers this decision framework in its model strategy chapter.
How many chunks should I retrieve and send to the LLM?+
Start with retrieving 15-20 candidates and re-ranking to the top 3-5 for the LLM. Sending too many chunks wastes context window space and can confuse the model. Sending too few risks missing the relevant document. Tune based on your evaluation metrics: if recall@5 is high but precision@5 is low, you are retrieving enough but your re-ranking needs work.
How do I handle questions the knowledge base cannot answer?+
This is critical. Configure the system prompt to instruct the model to say "I do not have enough information to answer that question" when the retrieved context is insufficient. Detect this at the retrieval layer by checking if the highest similarity score is below your minimum threshold. When no relevant documents are found, do not pass empty context to the LLM, since it will hallucinate an answer. The [hallucination glossary entry](/glossary/hallucination) explains why this happens.
How do I evaluate RAG quality end-to-end?+
Build an evaluation dataset of 50-100 questions with labeled ground-truth answers and the specific documents that contain the answer. Measure: (1) retrieval recall (did we find the right document?), (2) answer accuracy (is the generated answer correct?), (3) faithfulness (does the answer only use information from retrieved context?), and (4) citation accuracy (do citations point to the right sources?). Run this evaluation weekly. The [AI Eval Scorecard](/tools/ai-eval-scorecard) provides a structured framework for RAG evaluation.
What is the biggest mistake teams make with RAG systems?+
Skipping the retrieval evaluation. Teams test the LLM generation quality but never measure whether the retrieval layer is finding the right documents. If your retrieval precision@5 is only 40%, even the best LLM will produce poor answers 60% of the time because it is working with irrelevant context. Always measure retrieval quality independently before tuning the generation layer.
Related Tools
Explore More Templates
Browse our full library of PM templates, or generate a custom version with AI.