Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

Peilin Wu; Xinlu Zhang; Wenhao Yu; Xingyu Liu; Xinya Du; Zhiyu Zoey Chen

Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

Peilin Wu, Xinlu Zhang, Wenhao Yu, Xingyu Liu, Xinya Du, Zhiyu Zoey Chen

TL;DR

The paper argues that Retrieval-Augmented Language Models must adapt to diverse user needs and retrieval conditions. It introduces a user-centric evaluation framework that combines three user need cases with three context settings, and validates it with experiments on HotpotQA, DisentQA, and URAQ using two model families. Key findings show that memory restriction can boost robustness under adversarial retrieval but reduce peak performance, and that model-family and scale effects dominate behavior more than instruction type alone. The work highlights the necessity of user-centric benchmarking for real-world RALMs and provides insights into optimizing performance across varied retrieval contexts, with URAQ released to support future research.

Abstract

Recent advancements in Retrieval-Augmented Language Models (RALMs) have demonstrated their efficacy in knowledge-intensive tasks. However, existing evaluation benchmarks often assume a single optimal approach to leveraging retrieved information, failing to account for varying user needs. This paper introduces a novel evaluation framework that systematically assesses RALMs under three user need cases-Context-Exclusive, Context-First, and Memory-First-across three distinct context settings: Context Matching, Knowledge Conflict, and Information Irrelevant. By varying both user instructions and the nature of retrieved information, our approach captures the complexities of real-world applications where models must adapt to diverse user requirements. Through extensive experiments on multiple QA datasets, including HotpotQA, DisentQA, and our newly constructed synthetic URAQ dataset, we find that restricting memory usage improves robustness in adversarial retrieval conditions but decreases peak performance with ideal retrieval results and model family dominates behavioral differences. Our findings highlight the necessity of user-centric evaluations in the development of retrieval-augmented systems and provide insights into optimizing model performance across varied retrieval contexts. We will release our code and URAQ dataset upon acceptance of the paper.

Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

TL;DR

Abstract

Do Retrieval-Augmented Language Models Adapt to Varying User Needs?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)