Table of Contents
Fetching ...

RAGSys: Item-Cold-Start Recommender as RAG System

Emile Contal, Garrin McGoldrick

TL;DR

This work reframes In-Context Learning with Retrieval-Augmented Generation as an item-cold-start recommender problem, where the goal is to maximize the LLM’s generation probability by selecting a diverse, high-quality set of demonstrations. It introduces a principled retrieval framework that jointly optimizes query relevance, diversity, and demonstration quality bias, and uses Direct Preference Optimization (DPO) to evaluate and calibrate these components directly on downstream NLP tasks. Across multiple open LLMs and the TruthfulQA dataset, the study shows that combining relevance with diversity yields superior performance, while the impact of quality bias is model-dependent, underscoring the need for model-aware tuning. The paper also discusses practical deployment considerations, illustrating how recommender-system techniques and vector-store tooling can enable state-of-the-art, scalable RAG systems for real-world applications.

Abstract

Large Language Models (LLM) hold immense promise for real-world applications, but their generic knowledge often falls short of domain-specific needs. Fine-tuning, a common approach, can suffer from catastrophic forgetting and hinder generalizability. In-Context Learning (ICL) offers an alternative, which can leverage Retrieval-Augmented Generation (RAG) to provide LLMs with relevant demonstrations for few-shot learning tasks. This paper explores the desired qualities of a demonstration retrieval system for ICL. We argue that ICL retrieval in this context resembles item-cold-start recommender systems, prioritizing discovery and maximizing information gain over strict relevance. We propose a novel evaluation method that measures the LLM's subsequent performance on NLP tasks, eliminating the need for subjective diversity scores. Our findings demonstrate the critical role of diversity and quality bias in retrieved demonstrations for effective ICL, and highlight the potential of recommender system techniques in this domain.

RAGSys: Item-Cold-Start Recommender as RAG System

TL;DR

This work reframes In-Context Learning with Retrieval-Augmented Generation as an item-cold-start recommender problem, where the goal is to maximize the LLM’s generation probability by selecting a diverse, high-quality set of demonstrations. It introduces a principled retrieval framework that jointly optimizes query relevance, diversity, and demonstration quality bias, and uses Direct Preference Optimization (DPO) to evaluate and calibrate these components directly on downstream NLP tasks. Across multiple open LLMs and the TruthfulQA dataset, the study shows that combining relevance with diversity yields superior performance, while the impact of quality bias is model-dependent, underscoring the need for model-aware tuning. The paper also discusses practical deployment considerations, illustrating how recommender-system techniques and vector-store tooling can enable state-of-the-art, scalable RAG systems for real-world applications.

Abstract

Large Language Models (LLM) hold immense promise for real-world applications, but their generic knowledge often falls short of domain-specific needs. Fine-tuning, a common approach, can suffer from catastrophic forgetting and hinder generalizability. In-Context Learning (ICL) offers an alternative, which can leverage Retrieval-Augmented Generation (RAG) to provide LLMs with relevant demonstrations for few-shot learning tasks. This paper explores the desired qualities of a demonstration retrieval system for ICL. We argue that ICL retrieval in this context resembles item-cold-start recommender systems, prioritizing discovery and maximizing information gain over strict relevance. We propose a novel evaluation method that measures the LLM's subsequent performance on NLP tasks, eliminating the need for subjective diversity scores. Our findings demonstrate the critical role of diversity and quality bias in retrieved demonstrations for effective ICL, and highlight the potential of recommender system techniques in this domain.
Paper Structure (39 sections, 9 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 39 sections, 9 equations, 1 figure, 4 tables, 1 algorithm.

Figures (1)

  • Figure 1: Diversity Metric and DPO. Non-monotonous relationship between a diversity metric, the average cosine similarity between embedding pairs, and the quality metric DPO. Obtained by varying $\lambda_d$ with Llama-3-8B-chat and $k=6$.