Observations on Building RAG Systems for Technical Documents
Sumit Soman, Sujoy Roychowdhury
TL;DR
This study investigates how Retrieval Augmented Generation performs on technical documents in telecom, focusing on how chunk size, glossary handling, and retrieval strategies affect QA quality. It evaluates MPNet-based embeddings and a Llama2-7b-chat model on 42 domain questions drawn from IEEE standards to compare glossary- and full-document retrieval, with a finding that sentence-based retrieval and definition-term splitting improve results. The work reveals that embedding similarity signals are brittle across chunk sizes and that threshold-based retriever augmentation can be unreliable, highlighting practical constraints for long-form technical QA. It also points to domain-aligned evaluation metrics and follow-up-question capabilities as important directions for future RAG systems.
Abstract
Retrieval augmented generation (RAG) for technical documents creates challenges as embeddings do not often capture domain information. We review prior art for important factors affecting RAG and perform experiments to highlight best practices and potential challenges to build RAG systems for technical documents.
