Information Retrieval: Recent Advances and Beyond

Kailash A. Hambarde; Hugo Proenca

Information Retrieval: Recent Advances and Beyond

Kailash A. Hambarde, Hugo Proenca

TL;DR

The paper surveys information retrieval (IR) methods across two stages—initial retrieval and subsequent ranking—emphasizing the shift from traditional term-based matching to semantic and neural approaches enabled by large datasets and compute. It categorizes retrieval into sparse, dense, and hybrid methods, and discusses both first-stage retrieval and second-stage ranking, including pre-training objectives and expansion techniques. Key contributions include a taxonomy of retrieval methods, a synthesis of historical and modern techniques, datasets, and identified challenges such as long-tail and multilingual queries, with guidance for researchers and practitioners. The work highlights practical trade-offs and future directions to build scalable, accurate IR systems across diverse tasks and domains.

Abstract

In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain.

Information Retrieval: Recent Advances and Beyond

TL;DR

Abstract

Paper Structure (18 sections, 2 figures, 1 table)

This paper contains 18 sections, 2 figures, 1 table.

Introduction
Information Retrieval: Overview
Conventional Term-based Retrieval
Pioneering Methods for Semantic Retrieval
Query Augmentation
Document Augmentation
Lexical Dependency Model
Topic Model
Multilingual Retrieval Model
First Stage: Retrieval
Deep Learning Methods for Semantic Retrieval
Discrete Retrieval Methods
Dense Retrieval Methods
Hybrid Retrieval Methods
Second Stage - Ranker
...and 3 more sections

Figures (2)

Figure S1: Term map of the information retrieval. Colors indicate the recent terms density, extracted from survery papers.
Figure S2: Overview of modern Information Retrieval system.

Information Retrieval: Recent Advances and Beyond

TL;DR

Abstract

Information Retrieval: Recent Advances and Beyond

Authors

TL;DR

Abstract

Table of Contents

Figures (2)