Structure-Augmented Reasoning Generation
Jash Rajesh Parekh, Pengcheng Jiang, Jiawei Han
TL;DR
SARG introduces a post-retrieval, structure-augmented reasoning layer that builds a query-specific knowledge graph from retrieved passages, enabling bidirectional multi-hop reasoning and serialized reasoning chains to guide LLM generation. By extracting reasoning-centric triples, sparsifying the graph, and conducting semantic beam search, SARG provides explicit, traceable inference paths that improve factual grounding and coherence. Empirical results on open-domain and domain-specific benchmarks (including HotPotQA, MuSiQue, Bitcoin Price Fluctuations, and Gaucher Disease) show SARG outperforming flat-context RAG and several graph-based baselines, with strong faithfulness, synthesis, and reasoning metrics and favorable human evaluation. The approach remains retriever-agnostic and on-demand, offering interpretable reasoning without corpus-wide indexing or fine-tuning, making it practical for dynamic, high-stakes settings in finance and medicine.
Abstract
Recent advances in Large Language Models (LLMs) have significantly improved complex reasoning capabilities. Retrieval-Augmented Generation (RAG) has further extended these capabilities by grounding generation in dynamically retrieved evidence, enabling access to information beyond the model's training parameters. However, while RAG addresses knowledge availability, standard pipelines treat retrieved documents as independent, unstructured text chunks, forcing models to implicitly connect information across fragmented context. This limitation becomes critical for multi-hop queries, where answering correctly requires synthesizing information scattered across different documents. We present Structure-Augmented Reasoning Generation (SARG), a post-retrieval framework that addresses this gap by materializing explicit reasoning structures from retrieved context. SARG operates in three stages: extracting relational triples from retrieved documents via few-shot prompting, organizing these triples into a domain-adaptive knowledge graph, and performing multi-hop traversal to identify relevant reasoning chains. These chains, along with their associated text chunks, are then integrated into the generation prompt to explicitly guide the model's reasoning process. Importantly, SARG doesn't require custom retrievers or domain-specific fine-tuning. Instead, it functions as a modular layer compatible with all existing RAG pipelines. Extensive experiments on open-domain QA benchmarks and specialized reasoning datasets in finance and medicine demonstrate that SARG significantly outperforms state-of-the-art flat-context RAG baselines in both factual accuracy and reasoning coherence. Furthermore, by surfacing the exact traversal paths used during generation, SARG provides fully traceable and interpretable inference.
