TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction
Jie Zhang, Bo Tang, Wanzi Shao, Wenqiang Wei, Jihao Zhao, Jianqing Zhu, Zhiyu li, Wen Xi, Zehao Lin, Feiyu Xiong, Yanchao Tan
TL;DR
TAdaRAG introduces a task-adaptive retrieval-augmented generation framework that dynamically constructs domain-relevant knowledge graphs on the fly to mitigate hallucinations caused by chunked inputs. It combines intent-driven extraction templates, supervised fine-tuning on a high-quality corpus, and a reinforcement-learning-based implicit KG extraction to produce concise, non-redundant graphs, which are fused into generation via a graph-structured network and a REINFORCE-based optimization objective. Across six public benchmarks and NowNewsQA, TAdaRAG consistently outperforms strong RAG baselines in factual QA, multi-hop reasoning, and long-text summarization, demonstrating strong cross-domain generalization. The approach shows practical potential for real-world deployment, though it introduces computational overhead and relies on domain-specific templates, with future work aimed at efficiency and broader adaptability.
Abstract
Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to information loss, resulting in response hallucinations and broken reasoning chains. Moreover, traditional RAG retrieves unstructured knowledge, introducing irrelevant details that hinder accurate reasoning. To address these issues, we propose TAdaRAG, a novel RAG framework for on-the-fly task-adaptive knowledge graph construction from external sources. Specifically, we design an intent-driven routing mechanism to a domain-specific extraction template, followed by supervised fine-tuning and a reinforcement learning-based implicit extraction mechanism, ensuring concise, coherent, and non-redundant knowledge integration. Evaluations on six public benchmarks and a real-world business benchmark (NowNewsQA) across three backbone models demonstrate that TAdaRAG outperforms existing methods across diverse domains and long-text tasks, highlighting its strong generalization and practical effectiveness.
