Negative Sampling Techniques in Information Retrieval: A Survey

Laurin Wischounig; Abdelrahman Abdallah; Adam Jatowt

Negative Sampling Techniques in Information Retrieval: A Survey

Laurin Wischounig, Abdelrahman Abdallah, Adam Jatowt

Abstract

Information Retrieval (IR) is fundamental to many modern NLP applications. The rise of dense retrieval (DR), using neural networks to learn semantic vector representations, has significantly advanced IR performance. Central to training effective dense retrievers through contrastive learning is the selection of informative negative samples. Synthesizing 35 seminal papers, this survey provides a comprehensive and up-to-date overview of negative sampling techniques in dense IR. Our unique contribution is the focus on modern NLP applications and the inclusion of recent Large Language Model (LLM)-driven methods, an area absent in prior reviews. We propose a taxonomy that categorizes techniques including random, static/dynamically mined, and synthetic datasets. We then analyze these approaches with respect to trade-offs between effectiveness, computational cost, and implementation difficulty. The survey concludes by outlining current challenges and promising future directions for the use of LLM-generated synthetic data.

Negative Sampling Techniques in Information Retrieval: A Survey

Abstract

Paper Structure (59 sections, 1 equation, 1 figure, 11 tables)

This paper contains 59 sections, 1 equation, 1 figure, 11 tables.

Introduction
Related Work
Contrastive Learning for Dense Representations
Taxonomy of Negative Sampling Techniques
Sampling Techniques
Random and In-Batch Negatives
Static Hard Negative Mining
Dynamic Hard Negative Mining
Cluster-Based Mining
Principled Sampling with TriSampler
False Negative Mitigation
Filtering and Denoising Negatives
Robustness through Regularization
Data-Centric Methods
Data Augmentation
...and 44 more sections

Figures (1)

Figure 1: Taxonomy of negative sampling techniques for dense retrieval. The framework divides approaches into two main categories: Sampling-based techniques (orange) and Data-centric techniques (cyan)

Negative Sampling Techniques in Information Retrieval: A Survey

Abstract

Negative Sampling Techniques in Information Retrieval: A Survey

Authors

Abstract

Table of Contents

Figures (1)