Table of Contents
Fetching ...

Information Retrieval: Recent Advances and Beyond

Kailash A. Hambarde, Hugo Proenca

TL;DR

The paper surveys information retrieval (IR) methods across two stages—initial retrieval and subsequent ranking—emphasizing the shift from traditional term-based matching to semantic and neural approaches enabled by large datasets and compute. It categorizes retrieval into sparse, dense, and hybrid methods, and discusses both first-stage retrieval and second-stage ranking, including pre-training objectives and expansion techniques. Key contributions include a taxonomy of retrieval methods, a synthesis of historical and modern techniques, datasets, and identified challenges such as long-tail and multilingual queries, with guidance for researchers and practitioners. The work highlights practical trade-offs and future directions to build scalable, accurate IR systems across diverse tasks and domains.

Abstract

In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain.

Information Retrieval: Recent Advances and Beyond

TL;DR

The paper surveys information retrieval (IR) methods across two stages—initial retrieval and subsequent ranking—emphasizing the shift from traditional term-based matching to semantic and neural approaches enabled by large datasets and compute. It categorizes retrieval into sparse, dense, and hybrid methods, and discusses both first-stage retrieval and second-stage ranking, including pre-training objectives and expansion techniques. Key contributions include a taxonomy of retrieval methods, a synthesis of historical and modern techniques, datasets, and identified challenges such as long-tail and multilingual queries, with guidance for researchers and practitioners. The work highlights practical trade-offs and future directions to build scalable, accurate IR systems across diverse tasks and domains.

Abstract

In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain.
Paper Structure (18 sections, 2 figures, 1 table)

This paper contains 18 sections, 2 figures, 1 table.

Figures (2)

  • Figure S1: Term map of the information retrieval. Colors indicate the recent terms density, extracted from survery papers.
  • Figure S2: Overview of modern Information Retrieval system.