Neural Models for Information Retrieval
Bhaskar Mitra, Nick Craswell
TL;DR
The paper surveys neural information retrieval methods, framing them against traditional L2R and highlighting their data-hungry nature. It systematically covers text representations (local vs. distributed), embedding-based matching, and deep architectures (Siamese, interaction-based, and lexical/semantic hybrids) with concrete models such as DESM, CDSSM, Duet, and WMD. It discusses both long-document ranking and short-text matching, analyzes training data regimes (supervised, unsupervised, semi-supervised), and reviews neural toolkits and evaluation considerations. The work emphasizes balancing lexical precision with semantic coverage, and it outlines future directions including robustness, benchmarks, interpretability, and cross-pollination with NLP advances to drive practical IR impact.
Abstract
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By contrast, neural models learn representations of language from raw text that can bridge the gap between query and document vocabulary. Unlike classical IR models, these new machine learning based approaches are data-hungry, requiring large scale training data before they can be deployed. This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models. We begin by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text. We then review shallow neural IR methods that employ pre-trained neural term embeddings without learning the IR task end-to-end. We introduce deep neural networks next, discussing popular deep architectures. Finally, we review the current DNN models for information retrieval. We conclude with a discussion on potential future directions for neural IR.
