State Space Models are Strong Text Rerankers
Zhichao Xu, Jinghua Yan, Ashim Gupta, Vivek Srikumar
TL;DR
The paper evaluates state space models, particularly Mamba-1 and Mamba-2, as alternatives to transformers for text reranking in IR. It benchmarks these SSM-based rerankers against transformer baselines across passage and long-document tasks (MS MARCO and BEIR), examining both performance and computational efficiency. Findings show Mamba-2 achieves competitive ranking accuracy with similar-sized transformers and is more memory-efficient than Mamba-1, though overall training and inference lag behind transformer with Flash Attention. The work suggests SSMs are viable IR alternatives and identifies clear avenues for optimization, including architectural refinements and potential hybrid models, to bridge remaining efficiency gaps. Overall, the study demonstrates the potential of state space models for long-context text ranking while outlining practical barriers to deployment.
Abstract
Transformers dominate NLP and IR; but their inference inefficiencies and challenges in extrapolating to longer contexts have sparked interest in alternative model architectures. Among these, state space models (SSMs) like Mamba offer promising advantages, particularly $O(1)$ time complexity in inference. Despite their potential, SSMs' effectiveness at text reranking -- a task requiring fine-grained query-document interaction and long-context understanding -- remains underexplored. This study benchmarks SSM-based architectures (specifically, Mamba-1 and Mamba-2) against transformer-based models across various scales, architectures, and pre-training objectives, focusing on performance and efficiency in text reranking tasks. We find that (1) Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size; (2) they are less efficient in training and inference compared to transformers with flash attention; and (3) Mamba-2 outperforms Mamba-1 in both performance and efficiency. These results underscore the potential of state space models as a transformer alternative and highlight areas for improvement in future IR applications.
