Table of Contents
Fetching ...

TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

Jie Zhang, Bo Tang, Wanzi Shao, Wenqiang Wei, Jihao Zhao, Jianqing Zhu, Zhiyu li, Wen Xi, Zehao Lin, Feiyu Xiong, Yanchao Tan

TL;DR

TAdaRAG introduces a task-adaptive retrieval-augmented generation framework that dynamically constructs domain-relevant knowledge graphs on the fly to mitigate hallucinations caused by chunked inputs. It combines intent-driven extraction templates, supervised fine-tuning on a high-quality corpus, and a reinforcement-learning-based implicit KG extraction to produce concise, non-redundant graphs, which are fused into generation via a graph-structured network and a REINFORCE-based optimization objective. Across six public benchmarks and NowNewsQA, TAdaRAG consistently outperforms strong RAG baselines in factual QA, multi-hop reasoning, and long-text summarization, demonstrating strong cross-domain generalization. The approach shows practical potential for real-world deployment, though it introduces computational overhead and relies on domain-specific templates, with future work aimed at efficiency and broader adaptability.

Abstract

Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to information loss, resulting in response hallucinations and broken reasoning chains. Moreover, traditional RAG retrieves unstructured knowledge, introducing irrelevant details that hinder accurate reasoning. To address these issues, we propose TAdaRAG, a novel RAG framework for on-the-fly task-adaptive knowledge graph construction from external sources. Specifically, we design an intent-driven routing mechanism to a domain-specific extraction template, followed by supervised fine-tuning and a reinforcement learning-based implicit extraction mechanism, ensuring concise, coherent, and non-redundant knowledge integration. Evaluations on six public benchmarks and a real-world business benchmark (NowNewsQA) across three backbone models demonstrate that TAdaRAG outperforms existing methods across diverse domains and long-text tasks, highlighting its strong generalization and practical effectiveness.

TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

TL;DR

TAdaRAG introduces a task-adaptive retrieval-augmented generation framework that dynamically constructs domain-relevant knowledge graphs on the fly to mitigate hallucinations caused by chunked inputs. It combines intent-driven extraction templates, supervised fine-tuning on a high-quality corpus, and a reinforcement-learning-based implicit KG extraction to produce concise, non-redundant graphs, which are fused into generation via a graph-structured network and a REINFORCE-based optimization objective. Across six public benchmarks and NowNewsQA, TAdaRAG consistently outperforms strong RAG baselines in factual QA, multi-hop reasoning, and long-text summarization, demonstrating strong cross-domain generalization. The approach shows practical potential for real-world deployment, though it introduces computational overhead and relies on domain-specific templates, with future work aimed at efficiency and broader adaptability.

Abstract

Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to information loss, resulting in response hallucinations and broken reasoning chains. Moreover, traditional RAG retrieves unstructured knowledge, introducing irrelevant details that hinder accurate reasoning. To address these issues, we propose TAdaRAG, a novel RAG framework for on-the-fly task-adaptive knowledge graph construction from external sources. Specifically, we design an intent-driven routing mechanism to a domain-specific extraction template, followed by supervised fine-tuning and a reinforcement learning-based implicit extraction mechanism, ensuring concise, coherent, and non-redundant knowledge integration. Evaluations on six public benchmarks and a real-world business benchmark (NowNewsQA) across three backbone models demonstrate that TAdaRAG outperforms existing methods across diverse domains and long-text tasks, highlighting its strong generalization and practical effectiveness.

Paper Structure

This paper contains 32 sections, 9 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: An illustrative example where the RAG system fails to generate correct responses due to truncation, leading to hallucinations, broken reasoning and irrelevant details. Our proposed TAdaRAG addresses these issues by integrating the task-adaptive knowledge graph dynamically.
  • Figure 2: An illustration of our proposed TAdaRAG framework and its two-stage training: (1) Supervised Knowledge Extraction Fine-tuning and (2) Task-Adaptive Knowledge Graph Construction.
  • Figure 3: Long-context experiments on Mistral-7B.
  • Figure 4: Experiments on Qwen2.5-14B.
  • Figure 5: Multi-faceted comparison of different methods based on human and GPT-4o. Higher values indicate better performance, with a maximum value of 10.
  • ...and 7 more figures