Table of Contents
Fetching ...

Can Cross Encoders Produce Useful Sentence Embeddings?

Haritha Ananthakrishnan, Julian Dolby, Harsha Kokel, Horst Samulowitz, Kavitha Srinivas

TL;DR

Cross encoders excel at comparing sentence pairs but are costly for initial retrieval; dual encoders enable fast embedding-based retrieval but often lag in accuracy. The paper introduces a three-part pipeline that extracts early-layer CE embeddings, infuses them into a two-layer dual encoder (DE-2 CE), and uses CE-based reranking to boost top results. Empirical results show that early CE layers encode strong IR signals and that DE-2 CE can approach the baseline DE within about 1% across multiple datasets while delivering a ~5.15x inference speedup, with diminishing gains from deeper CE infusion. This approach enables more efficient retrieval-augmented generation by leveraging CE-quality representations in fast dense retrieval pipelines.

Abstract

Cross encoders (CEs) are trained with sentence pairs to detect relatedness. As CEs require sentence pairs at inference, the prevailing view is that they can only be used as re-rankers in information retrieval pipelines. Dual encoders (DEs) are instead used to embed sentences, where sentence pairs are encoded by two separate encoders with shared weights at training, and a loss function that ensures the pair's embeddings lie close in vector space if the sentences are related. DEs however, require much larger datasets to train, and are less accurate than CEs. We report a curious finding that embeddings from earlier layers of CEs can in fact be used within an information retrieval pipeline. We show how to exploit CEs to distill a lighter-weight DE, with a 5.15x speedup in inference time.

Can Cross Encoders Produce Useful Sentence Embeddings?

TL;DR

Cross encoders excel at comparing sentence pairs but are costly for initial retrieval; dual encoders enable fast embedding-based retrieval but often lag in accuracy. The paper introduces a three-part pipeline that extracts early-layer CE embeddings, infuses them into a two-layer dual encoder (DE-2 CE), and uses CE-based reranking to boost top results. Empirical results show that early CE layers encode strong IR signals and that DE-2 CE can approach the baseline DE within about 1% across multiple datasets while delivering a ~5.15x inference speedup, with diminishing gains from deeper CE infusion. This approach enables more efficient retrieval-augmented generation by leveraging CE-quality representations in fast dense retrieval pipelines.

Abstract

Cross encoders (CEs) are trained with sentence pairs to detect relatedness. As CEs require sentence pairs at inference, the prevailing view is that they can only be used as re-rankers in information retrieval pipelines. Dual encoders (DEs) are instead used to embed sentences, where sentence pairs are encoded by two separate encoders with shared weights at training, and a loss function that ensures the pair's embeddings lie close in vector space if the sentences are related. DEs however, require much larger datasets to train, and are less accurate than CEs. We report a curious finding that embeddings from earlier layers of CEs can in fact be used within an information retrieval pipeline. We show how to exploit CEs to distill a lighter-weight DE, with a 5.15x speedup in inference time.

Paper Structure

This paper contains 15 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Extraction of layer-wise embeddings
  • Figure 2: Knowledge infusion from cross encoder
  • Figure 3: Comparison of layerwise performance of DE and CE on the msmarco pair of models. While performance improves monotonically with layers for DE, the CE embeddings from lower layers show surprisingly good performance, on many datasets.
  • Figure 4: Performance of the DE final output layer (DE), CE embed, CE encoding layer 0 for the ms-marco pair of models.
  • Figure 5: Performance of the DE final output layer (DE), CE embed, CE encoding layer 0 for the mixed bread pair of models.