Table of Contents
Fetching ...

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han

TL;DR

RAS introduces a dynamic, query-specific knowledge structuring paradigm for LLMs by iteratively planning retrieval, extracting structured triples, and reasoning over an evolving knowledge graph $G_Q$. The framework interleaves a knowledge-state-aware planner, text-to-triples conversion, and graph-conditioned answering within a multitask Graph LLM setup, enabling tailored, interpretable multi-hop reasoning. Across seven knowledge-intensive benchmarks, RAS yields consistent gains (up to 6.4% open-source and 7.0% proprietary) and shows robustness to partial graphs and varying data regimes, underscoring the value of explicit structured knowledge in guiding inference. The work highlights practical benefits for transparency and reliability in high-stakes generation and points to future improvements in extraction quality, graph evolution, and cross-domain applicability.

Abstract

Large language models (LLMs) have achieved impressive performance on knowledge-intensive tasks, yet they often struggle with multi-step reasoning due to the unstructured nature of retrieved context. While retrieval-augmented generation (RAG) methods provide external information, the lack of explicit organization among retrieved passages limits their effectiveness, leading to brittle reasoning pathways. Recent interpretability studies highlighting the importance of structured intermediate reasoning further align with this perspective. We propose Retrieval-And-Structuring (RAS), a framework that dynamically constructs query-specific knowledge graphs through iterative retrieval and structured knowledge building. RAS interleaves targeted retrieval planning with incremental graph construction, enabling models to assemble and reason over evolving knowledge structures tailored to each query. On seven knowledge-intensive benchmarks, RAS consistently outperforms strong baselines, achieving up to 6.4% and 7.0% gains with open-source and proprietary LLMs, respectively. Our results demonstrate that dynamic, query-specific knowledge structuring offers a robust path to improving reasoning accuracy and robustness in language model generation. Our data and code can be found at https://github.com/pat-jj/RAS.

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

TL;DR

RAS introduces a dynamic, query-specific knowledge structuring paradigm for LLMs by iteratively planning retrieval, extracting structured triples, and reasoning over an evolving knowledge graph . The framework interleaves a knowledge-state-aware planner, text-to-triples conversion, and graph-conditioned answering within a multitask Graph LLM setup, enabling tailored, interpretable multi-hop reasoning. Across seven knowledge-intensive benchmarks, RAS yields consistent gains (up to 6.4% open-source and 7.0% proprietary) and shows robustness to partial graphs and varying data regimes, underscoring the value of explicit structured knowledge in guiding inference. The work highlights practical benefits for transparency and reliability in high-stakes generation and points to future improvements in extraction quality, graph evolution, and cross-domain applicability.

Abstract

Large language models (LLMs) have achieved impressive performance on knowledge-intensive tasks, yet they often struggle with multi-step reasoning due to the unstructured nature of retrieved context. While retrieval-augmented generation (RAG) methods provide external information, the lack of explicit organization among retrieved passages limits their effectiveness, leading to brittle reasoning pathways. Recent interpretability studies highlighting the importance of structured intermediate reasoning further align with this perspective. We propose Retrieval-And-Structuring (RAS), a framework that dynamically constructs query-specific knowledge graphs through iterative retrieval and structured knowledge building. RAS interleaves targeted retrieval planning with incremental graph construction, enabling models to assemble and reason over evolving knowledge structures tailored to each query. On seven knowledge-intensive benchmarks, RAS consistently outperforms strong baselines, achieving up to 6.4% and 7.0% gains with open-source and proprietary LLMs, respectively. Our results demonstrate that dynamic, query-specific knowledge structuring offers a robust path to improving reasoning accuracy and robustness in language model generation. Our data and code can be found at https://github.com/pat-jj/RAS.

Paper Structure

This paper contains 32 sections, 9 equations, 24 figures, 8 tables, 1 algorithm.

Figures (24)

  • Figure 1: Overview of the Retrieval-And-Structuring (RAS) framework. RAS operates through three stages: (1) Planning strategically determines knowledge retrieval needs and generates focused sub-queries based on the current knowledge state; (2) Text Retrieval and Knowledge Structuring retrieves text based on the sub-query and transforms the retrieved content into a graph, which is emerged to an evolving knowledge graph that expands based on reasoning needs; and (3) Answering leverages the accumulated structured knowledge to produce the final output. The framework employs parameter-efficient training to train the Graph LLM perozzi2024lethe2024gretriever, fine-tuning only the graph encoder, projector components, optionally with LoRA. We provide a step-by-step running example in Figure \ref{['fig:run_example']} to concretize the framework.
  • Figure 2: Ablations in Training and Inference (with RAS$_{7\text{B}}$).Training: "No GraphEncode" removes the graph encoder during training, using only LoRA-based LLM fine-tuning. "No LoRA" uses graph token optimization without low-rank adaptation. "No Text-to-Triple" keeps the original retrieved texts instead of converting them into triples. "No Multi-Task" trains two models separately handling planning and answering. Inference: "No Retrieval" tests direct query answering without any context. "No GraphEncode" removes graph encoding and projection during inference, using only textual context. "No Planning" removes the planning module and runs single-pass retrieval-structuring-answering pipeline.
  • Figure 3: Impact of Graph Information Abundance. For each sample, we randomly shuffle its associated triples five times and take different ratios (10%–100%) of the shuffled data. The performance scores are averaged across these five shuffling runs.
  • Figure 4: Impact of Training Data Volume on Model Performance. Results for RAS$_{\text{7B}}$ (top) and RAS$_{\text{8B}}$ (bottom) illustrate how performance scales with increasing training data.
  • Figure 5: Potential Future Extensions of RAS.
  • ...and 19 more figures