Table of Contents
Fetching ...

State Space Models are Strong Text Rerankers

Zhichao Xu, Jinghua Yan, Ashim Gupta, Vivek Srikumar

TL;DR

The paper evaluates state space models, particularly Mamba-1 and Mamba-2, as alternatives to transformers for text reranking in IR. It benchmarks these SSM-based rerankers against transformer baselines across passage and long-document tasks (MS MARCO and BEIR), examining both performance and computational efficiency. Findings show Mamba-2 achieves competitive ranking accuracy with similar-sized transformers and is more memory-efficient than Mamba-1, though overall training and inference lag behind transformer with Flash Attention. The work suggests SSMs are viable IR alternatives and identifies clear avenues for optimization, including architectural refinements and potential hybrid models, to bridge remaining efficiency gaps. Overall, the study demonstrates the potential of state space models for long-context text ranking while outlining practical barriers to deployment.

Abstract

Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.

State Space Models are Strong Text Rerankers

TL;DR

The paper evaluates state space models, particularly Mamba-1 and Mamba-2, as alternatives to transformers for text reranking in IR. It benchmarks these SSM-based rerankers against transformer baselines across passage and long-document tasks (MS MARCO and BEIR), examining both performance and computational efficiency. Findings show Mamba-2 achieves competitive ranking accuracy with similar-sized transformers and is more memory-efficient than Mamba-1, though overall training and inference lag behind transformer with Flash Attention. The work suggests SSMs are viable IR alternatives and identifies clear avenues for optimization, including architectural refinements and potential hybrid models, to bridge remaining efficiency gaps. Overall, the study demonstrates the potential of state space models for long-context text ranking while outlining practical barriers to deployment.

Abstract

Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.

Paper Structure

This paper contains 33 sections, 6 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Training throughput comparison between models$\approx$330M. For batch_size=8, all models except OPT-FlashAttn and Mamba-2 run out of memory with a 48 GB VRAM GPU.
  • Figure 2: Inference profiling results for Mamba models versus OPT models of similar size.