Table of Contents
Fetching ...

CompAct: Compressing Retrieved Documents Actively for Question Answering

Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang

TL;DR

CompAct is introduced, a novel framework that employs an active strategy to condense extensive documents without losing key information and flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates.

Abstract

Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x).

CompAct: Compressing Retrieved Documents Actively for Question Answering

TL;DR

CompAct is introduced, a novel framework that employs an active strategy to condense extensive documents without losing key information and flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates.

Abstract

Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x).
Paper Structure (38 sections, 4 equations, 6 figures, 12 tables)

This paper contains 38 sections, 4 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Performance of HotpotQA with different top-$k$ documents, using LLaMA3-8B llama3 as the reader. CompAct shows solid performance improvements that align with those of gold documents. This highlights CompAct's ability to effectively leverage the benefits of increased top-$k$, unlike other methods that struggle with noisy context.
  • Figure 2: Overall CompAct framework as a plug-in module between the retriever and the reader LLM. After splitting retrieved documents into segments, CompAct sequentially compresses these segments into a compacted context. By jointly analyzing the previous context with newly provided segments, we actively compress input documents while preserving essential information in the compressed context. If the segments do not offer complete information to answer the question (1st and 2nd segments), CompAct continues to the next step to acquire new information. Once all supporting clues are fully captured ($N$-th segment), the iteration ends.
  • Figure 3: Distribution of iteration points where models determine the compressed contexts to be complete. The frequencies of completeness are accumulated over iterations. We compare the distribution between GPT-4o (Yellow) and CompAct (Green). We also measure the percentage of correctness at each iteration, using an F1 score of 0.4 as the threshold for correction.
  • Figure 4: Performance of HotpotQA with different top-$k$ documents, using Contriever (upper) as the retriever and GPT-3.5-Turbo (lower) as the reader.
  • Figure 5: Performance of HotpotQA with different top-$k$ documents, using BM25 as the retriever.
  • ...and 1 more figures