Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

Jonghyun Song; Cheyon Jin; Wenlong Zhao; Andrew McCallum; Jay-Yoon Lee

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

Jonghyun Song, Cheyon Jin, Wenlong Zhao, Andrew McCallum, Jay-Yoon Lee

TL;DR

Experimental results on the ZeSHEL dataset demonstrate that CMC, when plugged in between bi-encoders and cross-encoders as a seamless intermediate reranker (BE-CMC-CE), can effectively improve recall@k and verify its effectiveness as the final-stage reranker in improving top-1 accuracy.

Abstract

A common retrieve-and-rerank paradigm involves retrieving relevant candidates from a broad set using a fast bi-encoder (BE), followed by applying expensive but accurate cross-encoders (CE) to a limited candidate set. However, relying on this small subset is often susceptible to error propagation from the bi-encoders, which limits the overall performance. To address these issues, we propose the Comparing Multiple Candidates (CMC) framework. CMC compares a query and multiple embeddings of similar candidates (i.e., neighbors) through shallow self-attention layers, delivering rich representations contextualized to each other. Furthermore, CMC is scalable enough to handle multiple comparisons simultaneously. For example, comparing ~10K candidates with CMC takes a similar amount of time as comparing 16 candidates with CE. Experimental results on the ZeSHEL dataset demonstrate that CMC, when plugged in between bi-encoders and cross-encoders as a seamless intermediate reranker (BE-CMC-CE), can effectively improve recall@k (+4.8%-p, +3.5%-p for R@16, R@64) compared to using only bi-encoders (BE-CE), with negligible slowdown (<7%). Additionally, to verify CMC's effectiveness as the final-stage reranker in improving top-1 accuracy, we conduct experiments on downstream tasks such as entity, passage, and dialogue ranking. The results indicate that CMC is not only faster (11x) but also often more effective than CE, with improved prediction accuracy in Wikipedia entity linking (+0.7%-p) and DSTC7 dialogue ranking (+3.3%-p).

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

TL;DR

Abstract

Paper Structure (50 sections, 7 equations, 4 figures, 13 tables)

This paper contains 50 sections, 7 equations, 4 figures, 13 tables.

Introduction
Background and Related Works
Retrieve and Rerank
Related Work
Bi-encoders and Cross-encoders
Late Interaction
Listwise Ranking
Proposed Method
Model Architecture
Query and Candidate Encoders
Self-attention Layer
Training
Optimization
Negative Sampling
Inference
...and 35 more sections

Figures (4)

Figure 1: Model architectures for retrieval tasks. (a), (b), and (c) are existing architectures. (d) is our proposed 'Comparing Multiple Candidates (CMC)' architecture, which computes compatibility score by comparing the embeddings of a query and K multiple candidates via self-attention layers. Contrary to (a)-(c), CMC can process multiple candidates at once rather than conducting several forward passes for each (query, candidate) pair.
Figure 2: Overview of the proposed CMC framework that compares multiple candidates at once. CMC can seamlessly enhance retriever, finding top-K' candidates, or function as a direct reranker which outputs top-1 candidate. Candidate embeddings for bi-encoders and CMC are both precomputed while query embeddings for bi-encoders and CMC are computed in parallel on the fly. After bi-encoders retrieve top-$K$ candidates, CMC indexes the corresponding candidate embeddings and passes through a two-layer transformer encoder. Here, the additional latency is limited to the execution of self-attention layers.
Figure 3: Illustration of candidate retrieval for cross-encoders (CE). Suppose cross-encoders can process up to M candidates due to limited scalability. (a) In bi-encoder (BE) retrieval, the BE-CE framework takes M candidates and risks missing the gold candidates due to inaccurate bi-encoders, causing the entire system to suffer from error propagation from the retriever and fail to get the correct candidate. (b) When CMC is introduced as the seamless intermediate reranker (BE-CMC-CE), CMC can consider a significantly larger pool (K) of BE candidates. This allows CMC to provide much fewer K’ (K>M>K’) and higher-quality candidates to the CE while increasing the chance to include the positive candidate.
Figure 4: The relationship between the number of candidates and the corresponding time measurements in milliseconds for two different models: Cross-encoder (CE) and Comparing Multiple Candidates (CMC).

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

TL;DR

Abstract

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (4)