Table of Contents
Fetching ...

CSPLADE: Learned Sparse Retrieval with Causal Language Models

Zhichao Xu, Aosong Feng, Yijun Tian, Haibo Ding, Lin Lee Cheong

TL;DR

CSPLADE extends learned sparse retrieval (LSR) to decoder-only large language models by introducing a lightweight adaptation phase and two bidirectional information variants, enabling training with up to an 8B-scale backbone. The method yields competitive MS MARCO and BEIR performance while maintaining a compact vocabulary-sized representation suitable for inverted-index retrieval, and it provides a careful analysis of quantization effects on retrieval efficiency. Key contributions include the adaptation-based stabilization, echo and bidirectional variants, and a quantization-aware perspective on latency and memory tradeoffs. These results offer practical guidance for scaling LSR with modern LLM backbones and for deploying efficient retrieval systems.

Abstract

In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the drawback of large index size. Learned sparse retrieval (LSR) has emerged as promising alternative, achieving competitive retrieval performance while also being able to leverage the classical inverted index data structure for efficient retrieval. However, limited works have explored scaling LSR beyond BERT scale. In this work, we identify two challenges in training large language models (LLM) for LSR: (1) training instability during the early stage of contrastive training; (2) suboptimal performance due to pre-trained LLM's unidirectional attention. To address these challenges, we propose two corresponding techniques: (1) a lightweight adaptation training phase to eliminate training instability; (2) two model variants to enable bidirectional information. With these techniques, we are able to train LSR models with 8B scale LLM, and achieve competitive retrieval performance with reduced index size. Furthermore, we are among the first to analyze the performance-efficiency tradeoff of LLM-based LSR model through the lens of model quantization. Our findings provide insights into adapting LLMs for efficient retrieval modeling.

CSPLADE: Learned Sparse Retrieval with Causal Language Models

TL;DR

CSPLADE extends learned sparse retrieval (LSR) to decoder-only large language models by introducing a lightweight adaptation phase and two bidirectional information variants, enabling training with up to an 8B-scale backbone. The method yields competitive MS MARCO and BEIR performance while maintaining a compact vocabulary-sized representation suitable for inverted-index retrieval, and it provides a careful analysis of quantization effects on retrieval efficiency. Key contributions include the adaptation-based stabilization, echo and bidirectional variants, and a quantization-aware perspective on latency and memory tradeoffs. These results offer practical guidance for scaling LSR with modern LLM backbones and for deploying efficient retrieval systems.

Abstract

In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the drawback of large index size. Learned sparse retrieval (LSR) has emerged as promising alternative, achieving competitive retrieval performance while also being able to leverage the classical inverted index data structure for efficient retrieval. However, limited works have explored scaling LSR beyond BERT scale. In this work, we identify two challenges in training large language models (LLM) for LSR: (1) training instability during the early stage of contrastive training; (2) suboptimal performance due to pre-trained LLM's unidirectional attention. To address these challenges, we propose two corresponding techniques: (1) a lightweight adaptation training phase to eliminate training instability; (2) two model variants to enable bidirectional information. With these techniques, we are able to train LSR models with 8B scale LLM, and achieve competitive retrieval performance with reduced index size. Furthermore, we are among the first to analyze the performance-efficiency tradeoff of LLM-based LSR model through the lens of model quantization. Our findings provide insights into adapting LLMs for efficient retrieval modeling.

Paper Structure

This paper contains 29 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Quantization evaluation results for CSPLADE-Bi-1B. Left figure shows performance while right figure shows inference speed. See \ref{['appendix:quantization_details']} for CSPLADE-Bi-8B results.
  • Figure 2: Pseudo code for adaption phase training loss computation.
  • Figure 3: Quantization evaluation results for CSPLADE-Bi-8B. Left figure shows performance while right figure shows inference speed.