Table of Contents
Fetching ...

Entropy-Based Decoding for Retrieval-Augmented Large Language Models

Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King

TL;DR

This work tackles distractibility in retrieval-augmented LLMs by introducing training-free entropy-guided decoding. It first ensembles retrieved documents in parallel using an entropy-based weighting (LeEns) to produce a low-entropy, information-rich contextual distribution, then applies a contrastive step (CLeHe) against a high-entropy parametric distribution from selected layers, including a PMI-based variant. The approach yields consistent improvements on open-domain QA benchmarks across multiple LLMs, with modest latency increases, and offers insights into layer-wise entropy as a meaningful reference for contrast. These methods provide a practical, training-free path to harness external knowledge more accurately in QA tasks, with potential applicability to broader knowledge-intensive tasks in the future.

Abstract

Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. Our approach utilizes entropy-based document-parallel ensemble decoding to prioritize low-entropy distributions from retrieved documents, thereby enhancing the extraction of relevant information of context. Additionally, it incorporates a contrastive decoding mechanism that contrasts the obtained low-entropy ensemble distribution with the high-entropy distribution derived from the model's internal knowledge across layers, which ensures a greater emphasis on reliable external information. Extensive experiments on open-domain question answering datasets demonstrate the superiority of our method.

Entropy-Based Decoding for Retrieval-Augmented Large Language Models

TL;DR

This work tackles distractibility in retrieval-augmented LLMs by introducing training-free entropy-guided decoding. It first ensembles retrieved documents in parallel using an entropy-based weighting (LeEns) to produce a low-entropy, information-rich contextual distribution, then applies a contrastive step (CLeHe) against a high-entropy parametric distribution from selected layers, including a PMI-based variant. The approach yields consistent improvements on open-domain QA benchmarks across multiple LLMs, with modest latency increases, and offers insights into layer-wise entropy as a meaningful reference for contrast. These methods provide a practical, training-free path to harness external knowledge more accurately in QA tasks, with potential applicability to broader knowledge-intensive tasks in the future.

Abstract

Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. Our approach utilizes entropy-based document-parallel ensemble decoding to prioritize low-entropy distributions from retrieved documents, thereby enhancing the extraction of relevant information of context. Additionally, it incorporates a contrastive decoding mechanism that contrasts the obtained low-entropy ensemble distribution with the high-entropy distribution derived from the model's internal knowledge across layers, which ensures a greater emphasis on reliable external information. Extensive experiments on open-domain question answering datasets demonstrate the superiority of our method.

Paper Structure

This paper contains 26 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the decoding process of CLeHe.
  • Figure 2: Impact of positioning the oracle document on multi-document question answering performance. A 10-document context typically uses less than 2K tokens; a 20-document context usually uses less than 4K tokens.
  • Figure 3:
  • Figure 4:
  • Figure 6: Hyper-parameter analysis using 1K evaluation samples of NQ under the top-5 document setting.
  • ...and 2 more figures