Table of Contents
Fetching ...

Reducing Distraction in Long-Context Language Models by Focused Learning

Zijun Wu, Bingyuan Liu, Ran Yan, Lei Chen, Thomas Delteil

TL;DR

A novel training method is proposed that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

Reducing Distraction in Long-Context Language Models by Focused Learning

TL;DR

A novel training method is proposed that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

Paper Structure

This paper contains 20 sections, 5 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Our method. Retrieval-based data augmentation: we filter out the distracting content from a document D' using a retriever, retaining only the top-k relevant chunks. The irrelevant portions are replaced with the $\texttt{<mask>}$ tokens. Contrastive Training: taking $D_1$ as an example, an augmented $D'_1$ is considered a positive pair with $D_1$ (solid line), whereas the augmented versions of other documents $D'_2, \cdots, D'_N$ serve as negative pairs (dashed line) for $D_1$.
  • Figure 2: Performance curves when placing the gold documents at different positions of the context at inference, when varying the total number of documents. The shaded area in each plot represents the last window context utilized by the sliding window attention mechanism in the Mistral model.
  • Figure 3: The sentence-level attention maps between the question "Which Indian actor has won most national awards?" and a concatenation of 10 documents. (a) is the vanilla method attention maps, and (b) is the re-focused attention maps after training with our method. The green rectangle is the location of the answer.