Table of Contents
Fetching ...

Condenser: a Pre-training Architecture for Dense Retrieval

Luyu Gao, Jamie Callan

TL;DR

This paper identifies why standard pre-trained LMs struggle as dense bi-encoders and proposes Condenser, a Transformer-based pre-training architecture that actively conditions LM predictions on dense representations to establish structural readiness for dense retrieval. By separating early and late backbone processing and introducing a Condenser head, the method guides the model to produce informative dense representations that can be fine-tuned as a standard encoder. Across sentence similarity, open-domain QA retrieval, and web-search retrieval, Condenser yields strong gains, particularly in low-data settings, and approaches or surpasses more complex pipelines in many full-data scenarios. Attention analyses corroborate that Condenser maintains a more task-friendly internal structure, suggesting structural readiness is a key factor in efficient dense retrieval pre-training and deployment.

Abstract

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient text comparison and retrieval. However, dense encoders require a lot of data and sophisticated techniques to effectively train and suffer in low data situations. This paper finds a key reason is that standard LMs' internal attention structure is not ready-to-use for dense encoders, which needs to aggregate text information into the dense representation. We propose to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation. Our experiments show Condenser improves over standard LM by large margins on various text retrieval and similarity tasks.

Condenser: a Pre-training Architecture for Dense Retrieval

TL;DR

This paper identifies why standard pre-trained LMs struggle as dense bi-encoders and proposes Condenser, a Transformer-based pre-training architecture that actively conditions LM predictions on dense representations to establish structural readiness for dense retrieval. By separating early and late backbone processing and introducing a Condenser head, the method guides the model to produce informative dense representations that can be fine-tuned as a standard encoder. Across sentence similarity, open-domain QA retrieval, and web-search retrieval, Condenser yields strong gains, particularly in low-data settings, and approaches or surpasses more complex pipelines in many full-data scenarios. Attention analyses corroborate that Condenser maintains a more task-friendly internal structure, suggesting structural readiness is a key factor in efficient dense retrieval pre-training and deployment.

Abstract

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient text comparison and retrieval. However, dense encoders require a lot of data and sophisticated techniques to effectively train and suffer in low data situations. This paper finds a key reason is that standard LMs' internal attention structure is not ready-to-use for dense encoders, which needs to aggregate text information into the dense representation. We propose to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation. Our experiments show Condenser improves over standard LM by large margins on various text retrieval and similarity tasks.

Paper Structure

This paper contains 50 sections, 9 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Condenser: We show 2 early and 2 late backbone layers here, in our experiments each have 6 layers. Condenser Head is dropped during fine-tuning.
  • Figure 2: Attention patterns in pre-trained v.s. fine-tuned BERT, ICT and Condenser.