RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

Pengcheng Jiang; Lang Cao; Ruike Zhu; Minhao Jiang; Yunyi Zhang; Jimeng Sun; Jiawei Han

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han

TL;DR

RAS introduces a dynamic, query-specific knowledge structuring paradigm for LLMs by iteratively planning retrieval, extracting structured triples, and reasoning over an evolving knowledge graph $G_Q$. The framework interleaves a knowledge-state-aware planner, text-to-triples conversion, and graph-conditioned answering within a multitask Graph LLM setup, enabling tailored, interpretable multi-hop reasoning. Across seven knowledge-intensive benchmarks, RAS yields consistent gains (up to 6.4% open-source and 7.0% proprietary) and shows robustness to partial graphs and varying data regimes, underscoring the value of explicit structured knowledge in guiding inference. The work highlights practical benefits for transparency and reliability in high-stakes generation and points to future improvements in extraction quality, graph evolution, and cross-domain applicability.

Abstract

Large language models (LLMs) have achieved impressive performance on knowledge-intensive tasks, yet they often struggle with multi-step reasoning due to the unstructured nature of retrieved context. While retrieval-augmented generation (RAG) methods provide external information, the lack of explicit organization among retrieved passages limits their effectiveness, leading to brittle reasoning pathways. Recent interpretability studies highlighting the importance of structured intermediate reasoning further align with this perspective. We propose Retrieval-And-Structuring (RAS), a framework that dynamically constructs query-specific knowledge graphs through iterative retrieval and structured knowledge building. RAS interleaves targeted retrieval planning with incremental graph construction, enabling models to assemble and reason over evolving knowledge structures tailored to each query. On seven knowledge-intensive benchmarks, RAS consistently outperforms strong baselines, achieving up to 6.4% and 7.0% gains with open-source and proprietary LLMs, respectively. Our results demonstrate that dynamic, query-specific knowledge structuring offers a robust path to improving reasoning accuracy and robustness in language model generation. Our data and code can be found at https://github.com/pat-jj/RAS.

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

TL;DR

Abstract

RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (24)