Table of Contents
Fetching ...

MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking

Mathias Vast, Victor Morand, Basile van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski

TL;DR

MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.

Abstract

Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost. This prevents them from being used as first-stage rankers, but also incurs a cost when re-ranking documents. Prior work has addressed this bottleneck from two largely separate directions: accelerating cross-encoder inference by sparsifying the attention process or improving first-stage retrieval effectiveness using more complex models, e.g. late-interaction ones. In this work, we propose to bridge these two approaches, based on an in-depth understanding of the internal mechanisms of cross-encoders. Starting from cross-encoders, we show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions. We name this architecture MICE (Minimal Interaction Cross-Encoders). We extensively evaluate MICE across both in-domain (ID) and out-of-domain (OOD) datasets. MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.

MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking

TL;DR

MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.

Abstract

Cross-encoders deliver state-of-the-art ranking effectiveness in information retrieval, but have a high inference cost. This prevents them from being used as first-stage rankers, but also incurs a cost when re-ranking documents. Prior work has addressed this bottleneck from two largely separate directions: accelerating cross-encoder inference by sparsifying the attention process or improving first-stage retrieval effectiveness using more complex models, e.g. late-interaction ones. In this work, we propose to bridge these two approaches, based on an in-depth understanding of the internal mechanisms of cross-encoders. Starting from cross-encoders, we show that it is possible to derive a new late-interaction-like architecture by carefully removing detrimental or unnecessary interactions. We name this architecture MICE (Minimal Interaction Cross-Encoders). We extensively evaluate MICE across both in-domain (ID) and out-of-domain (OOD) datasets. MICE decreases fourfold the inference latency compared to standard cross-encoders, matching late-interaction models like ColBERT while retaining most of cross-encoder ID effectiveness and demonstrating superior generalization abilities in OOD.
Paper Structure (22 sections, 5 figures, 5 tables)

This paper contains 22 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: MICE Architecture: stripping cross-encoders to keep the strict minimum interactions that maintain effectiveness.
  • Figure 2: Masking approach. Interactions between input parts ([CLS], $Q$, $D$, [SEP]) are blocked using cumulative masking. Colors indicate the step where masking begins, ending with \ref{['mask_step:3']} in a complete $Q\not \leftrightarrow D$ separation (block-diagonal structure). Green blocks denote permanently preserved interactions and attention sinks.
  • Figure 3: In-domain nDCG@10 when masking all interactions between $Q$ and $D$ up to a given layer in the transformer (\ref{['mask_step:3']}).
  • Figure 4: Impact of dropping backbone's late layers in MICE. 3 interaction layers consistently recovers full performance.
  • Figure 5: Scaling law of MICE against standard cross-encoder using backbones from the Ettin ettin suite.