Table of Contents
Fetching ...

Structure-Augmented Reasoning Generation

Jash Rajesh Parekh, Pengcheng Jiang, Jiawei Han

TL;DR

SARG introduces a post-retrieval, structure-augmented reasoning layer that builds a query-specific knowledge graph from retrieved passages, enabling bidirectional multi-hop reasoning and serialized reasoning chains to guide LLM generation. By extracting reasoning-centric triples, sparsifying the graph, and conducting semantic beam search, SARG provides explicit, traceable inference paths that improve factual grounding and coherence. Empirical results on open-domain and domain-specific benchmarks (including HotPotQA, MuSiQue, Bitcoin Price Fluctuations, and Gaucher Disease) show SARG outperforming flat-context RAG and several graph-based baselines, with strong faithfulness, synthesis, and reasoning metrics and favorable human evaluation. The approach remains retriever-agnostic and on-demand, offering interpretable reasoning without corpus-wide indexing or fine-tuning, making it practical for dynamic, high-stakes settings in finance and medicine.

Abstract

Recent advances in Large Language Models (LLMs) have significantly improved complex reasoning capabilities. Retrieval-Augmented Generation (RAG) has further extended these capabilities by grounding generation in dynamically retrieved evidence, enabling access to information beyond the model's training parameters. However, while RAG addresses knowledge availability, standard pipelines treat retrieved documents as independent, unstructured text chunks, forcing models to implicitly connect information across fragmented context. This limitation becomes critical for multi-hop queries, where answering correctly requires synthesizing information scattered across different documents. We present Structure-Augmented Reasoning Generation (SARG), a post-retrieval framework that addresses this gap by materializing explicit reasoning structures from retrieved context. SARG operates in three stages: extracting relational triples from retrieved documents via few-shot prompting, organizing these triples into a domain-adaptive knowledge graph, and performing multi-hop traversal to identify relevant reasoning chains. These chains, along with their associated text chunks, are then integrated into the generation prompt to explicitly guide the model's reasoning process. Importantly, SARG doesn't require custom retrievers or domain-specific fine-tuning. Instead, it functions as a modular layer compatible with all existing RAG pipelines. Extensive experiments on open-domain QA benchmarks and specialized reasoning datasets in finance and medicine demonstrate that SARG significantly outperforms state-of-the-art flat-context RAG baselines in both factual accuracy and reasoning coherence. Furthermore, by surfacing the exact traversal paths used during generation, SARG provides fully traceable and interpretable inference.

Structure-Augmented Reasoning Generation

TL;DR

SARG introduces a post-retrieval, structure-augmented reasoning layer that builds a query-specific knowledge graph from retrieved passages, enabling bidirectional multi-hop reasoning and serialized reasoning chains to guide LLM generation. By extracting reasoning-centric triples, sparsifying the graph, and conducting semantic beam search, SARG provides explicit, traceable inference paths that improve factual grounding and coherence. Empirical results on open-domain and domain-specific benchmarks (including HotPotQA, MuSiQue, Bitcoin Price Fluctuations, and Gaucher Disease) show SARG outperforming flat-context RAG and several graph-based baselines, with strong faithfulness, synthesis, and reasoning metrics and favorable human evaluation. The approach remains retriever-agnostic and on-demand, offering interpretable reasoning without corpus-wide indexing or fine-tuning, making it practical for dynamic, high-stakes settings in finance and medicine.

Abstract

Recent advances in Large Language Models (LLMs) have significantly improved complex reasoning capabilities. Retrieval-Augmented Generation (RAG) has further extended these capabilities by grounding generation in dynamically retrieved evidence, enabling access to information beyond the model's training parameters. However, while RAG addresses knowledge availability, standard pipelines treat retrieved documents as independent, unstructured text chunks, forcing models to implicitly connect information across fragmented context. This limitation becomes critical for multi-hop queries, where answering correctly requires synthesizing information scattered across different documents. We present Structure-Augmented Reasoning Generation (SARG), a post-retrieval framework that addresses this gap by materializing explicit reasoning structures from retrieved context. SARG operates in three stages: extracting relational triples from retrieved documents via few-shot prompting, organizing these triples into a domain-adaptive knowledge graph, and performing multi-hop traversal to identify relevant reasoning chains. These chains, along with their associated text chunks, are then integrated into the generation prompt to explicitly guide the model's reasoning process. Importantly, SARG doesn't require custom retrievers or domain-specific fine-tuning. Instead, it functions as a modular layer compatible with all existing RAG pipelines. Extensive experiments on open-domain QA benchmarks and specialized reasoning datasets in finance and medicine demonstrate that SARG significantly outperforms state-of-the-art flat-context RAG baselines in both factual accuracy and reasoning coherence. Furthermore, by surfacing the exact traversal paths used during generation, SARG provides fully traceable and interpretable inference.

Paper Structure

This paper contains 65 sections, 6 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Overview of SARG. Step 1: Retrieved documents are processed to extract reasoning-relevant schema components $\langle\text{cause, relation, effect}\rangle$ and their context, forming a sparse graph where nodes represent concepts and edges capture their relationships. Step 2: Key concepts from the user query are extracted and matched to starting nodes in the local knowledge graph; our direction classifier predicts whether to traverse forward, backward, or bidirectionally based on the query structure. Step 3: Starting from predicted nodes, semantic beam search expands through the graph by evaluating query similarity scores at each hop, pruning low-relevance paths while retaining high-confidence reasoning chains. Step 4: Traversed reasoning chains are deduplicated and filtered for relevance, then combined with original evidence to assemble a structured prompt that provides attribution-aware, distilled context to the generator LLM for final response generation.
  • Figure 2: Human Evaluation Results ($N=30$; 15 queries per domain). Methods were rated on a 1--5 scale for reasoning quality. Win Rate denotes the percentage of samples where the method received the highest score. Inter-annotator agreement was substantial (Krippendorff's $\alpha = 0.71$).
  • Figure 3: Qualitative case study comparing SARG with expert-annotated triples antonucci2023zeroshot. SARG successfully reconstructs a multi-hop pathway which the human-annotated KG fails to recover.
  • Figure 4: Comprehensive comparison of graph-based RAG methods. Top row shows construction time and graph structure; bottom row shows per-query inference times. SARG achieves competitive construction time with the most efficient inference performance.