Table of Contents
Fetching ...

Grounding Language Model with Chunking-Free In-Context Retrieval

Hongjin Qian, Zheng Liu, Kelong Mao, Yujia Zhou, Zhicheng Dou

TL;DR

This work addresses grounding in Retrieval-Augmented Generation by removing the need for document chunking. It introduces CFIC, a chunking-free in-context retrieval framework that uses transformer hidden states to directly generate precise grounding text, guided by Constrained Sentence Prefix Decoding and Skip Decoding to improve efficiency and fidelity. CFIC is trained via Supervised Fine-Tuning on self-constructed long-context data and evaluated on LongBench open QA tasks, where it significantly outperforms chunking-based baselines and other chunking-free methods, demonstrating strong grounding and QA performance with up to 32k context. The approach offers a practical, scalable solution for grounding in RAG systems and highlights the importance of targeted evidence retrieval for accurate, faithful responses in knowledge-intensive tasks.

Abstract

This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapting language models to handle longer contexts, have their limitations. These methods either disrupt the semantic coherence of the text or fail to effectively address the issues of noise and inaccuracy in evidence retrieval. CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained. Our evaluations of CFIC on a range of open QA datasets demonstrate its superiority in retrieving relevant and accurate evidence, offering a significant improvement over traditional methods. By doing away with the need for document chunking, CFIC presents a more streamlined, effective, and efficient retrieval solution, making it a valuable advancement in the field of RAG systems.

Grounding Language Model with Chunking-Free In-Context Retrieval

TL;DR

This work addresses grounding in Retrieval-Augmented Generation by removing the need for document chunking. It introduces CFIC, a chunking-free in-context retrieval framework that uses transformer hidden states to directly generate precise grounding text, guided by Constrained Sentence Prefix Decoding and Skip Decoding to improve efficiency and fidelity. CFIC is trained via Supervised Fine-Tuning on self-constructed long-context data and evaluated on LongBench open QA tasks, where it significantly outperforms chunking-based baselines and other chunking-free methods, demonstrating strong grounding and QA performance with up to 32k context. The approach offers a practical, scalable solution for grounding in RAG systems and highlights the importance of targeted evidence retrieval for accurate, faithful responses in knowledge-intensive tasks.

Abstract

This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. Traditional RAG systems often struggle with grounding responses using precise evidence text due to the challenges of processing lengthy documents and filtering out irrelevant content. Commonly employed solutions, such as document chunking and adapting language models to handle longer contexts, have their limitations. These methods either disrupt the semantic coherence of the text or fail to effectively address the issues of noise and inaccuracy in evidence retrieval. CFIC addresses these challenges by circumventing the conventional chunking process. It utilizes the encoded hidden states of documents for in-context retrieval, employing auto-aggressive decoding to accurately identify the specific evidence text required for user queries, eliminating the need for chunking. CFIC is further enhanced by incorporating two decoding strategies, namely Constrained Sentence Prefix Decoding and Skip Decoding. These strategies not only improve the efficiency of the retrieval process but also ensure that the fidelity of the generated grounding text evidence is maintained. Our evaluations of CFIC on a range of open QA datasets demonstrate its superiority in retrieving relevant and accurate evidence, offering a significant improvement over traditional methods. By doing away with the need for document chunking, CFIC presents a more streamlined, effective, and efficient retrieval solution, making it a valuable advancement in the field of RAG systems.
Paper Structure (20 sections, 8 equations, 3 figures, 5 tables)

This paper contains 20 sections, 8 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Comparison of Chunking-Based and Chunking-Free Methods. The left panel illustrates the chunking-based method, involving chunking a lengthy document into smaller passages followed by refinement through passage ranking. The right panel depicts the chunking-free method proposed in this paper, where grounding text is directly decoded by LLMs without the need for document chunking.
  • Figure 2: Overview of the proposed method: CFIC. The middle part shows the Constrained Sentence Prefix Decoding strategy which ensures the generated text prefixes originate from the input article. The right part shows the Skip Decoding strategy which bypasses decoding the intermediate tokens while terminating generation at the position with the best likelihood of [eos] token. Gray tokens in the figure are bypassed during generation.
  • Figure 3: The choice of Maximum Decoding Length.