Table of Contents
Fetching ...

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarski, Joseph Le Roux

TL;DR

This work introduces a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space and demonstrates that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Abstract

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

TL;DR

This work introduces a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space and demonstrates that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Abstract

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.
Paper Structure (36 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 36 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Analysis of query embeddings from MS MARCO using ColBERT .
  • Figure 2: An example illustrating the difference in iterative and non-iterative Voronoi pruning of 2D document vectors. Each subfigure shows the maximum dot-product Voronoi regions and the retained document vectors.
  • Figure 3: Performance of LP Pruning and Voronoi Pruning on document vectors produced by ColBERT finetuned with the docsim regularizer (with $\alpha=0.8$).
  • Figure 4: Distributions showing how often document tokens contribute to max-dot product scores (in blue) and the aggregated mean errors over the position of tokens (in orange).
  • Figure 5: Distribution showing when a token at a particular position is pruned, relative to other tokens in the document. Lower values signify earlier pruning.
  • ...and 1 more figures