Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Patrice Béchard, Orlando Marquez Ayala
TL;DR
The paper tackles hallucination in Generative AI when producing structured outputs, such as workflow JSON from natural language requirements. It proposes a Retrieval-Augmented Generation pipeline with a domain-specific retriever and a separately trained LLM, showing that retrieved JSON objects can guide generation to fewer hallucinations while enabling smaller models. Empirical results on internal enterprise data and out-of-domain splits demonstrate substantial reductions in hallucinated steps and tables, with a 7B LLM and a compact 110M retriever delivering strong performance and deployment feasibility. The work highlights practical engineering implications and suggests directions for joint training and further efficiency improvements, aiming to make reliable, enterprise-grade GenAI more scalable and trustworthy.
Abstract
A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.
