Table of Contents
Fetching ...

Static Pruning in Dense Retrieval using Matrix Decomposition

Federico Siciliano, Francesca Pezzuti, Nicola Tonellotto, Fabrizio Silvestri

TL;DR

The paper addresses the efficiency bottleneck of dense retrieval where embedding dimensionality $d$ and corpus size $n$ drive index size and query latency. It proposes a PCA-based static pruning method that computes $D^T D = W Λ W^T$, keeps the first $m$ components to form $D_{hat} = T_m$, and projects queries as $q_{hat} = W_m^T q$, so scores are $s(q) = D_{hat} q_{hat}$. Empirically, it achieves over $50\%$ dimensionality reduction with at most a $5\%$ drop in $nDCG@10$ across models, and it remains effective in both in-domain and out-of-domain settings, including when pruning is learned on a different corpus. The offline nature and lack of per-query overhead make PCA-based pruning a practical tool to reduce storage and latency in dense-retrieval pipelines.

Abstract

In the era of dense retrieval, document indexing and retrieval is largely based on encoding models that transform text documents into embeddings. The efficiency of retrieval is directly proportional to the number of documents and the size of the embeddings. Recent studies have shown that it is possible to reduce embedding size without sacrificing - and in some cases improving - the retrieval effectiveness. However, the methods introduced by these studies are query-dependent, so they can't be applied offline and require additional computations during query processing, thus negatively impacting the retrieval efficiency. In this paper, we present a novel static pruning method for reducing the dimensionality of embeddings using Principal Components Analysis. This approach is query-independent and can be executed offline, leading to a significant boost in dense retrieval efficiency with a negligible impact on the system effectiveness. Our experiments show that our proposed method reduces the dimensionality of document representations by over 50% with up to a 5% reduction in NDCG@10, for different dense retrieval models.

Static Pruning in Dense Retrieval using Matrix Decomposition

TL;DR

The paper addresses the efficiency bottleneck of dense retrieval where embedding dimensionality and corpus size drive index size and query latency. It proposes a PCA-based static pruning method that computes , keeps the first components to form , and projects queries as , so scores are . Empirically, it achieves over dimensionality reduction with at most a drop in across models, and it remains effective in both in-domain and out-of-domain settings, including when pruning is learned on a different corpus. The offline nature and lack of per-query overhead make PCA-based pruning a practical tool to reduce storage and latency in dense-retrieval pipelines.

Abstract

In the era of dense retrieval, document indexing and retrieval is largely based on encoding models that transform text documents into embeddings. The efficiency of retrieval is directly proportional to the number of documents and the size of the embeddings. Recent studies have shown that it is possible to reduce embedding size without sacrificing - and in some cases improving - the retrieval effectiveness. However, the methods introduced by these studies are query-dependent, so they can't be applied offline and require additional computations during query processing, thus negatively impacting the retrieval efficiency. In this paper, we present a novel static pruning method for reducing the dimensionality of embeddings using Principal Components Analysis. This approach is query-independent and can be executed offline, leading to a significant boost in dense retrieval efficiency with a negligible impact on the system effectiveness. Our experiments show that our proposed method reduces the dimensionality of document representations by over 50% with up to a 5% reduction in NDCG@10, for different dense retrieval models.

Paper Structure

This paper contains 5 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Effectiveness on DL 19 when applying PCA computed on $10^5$ in-domain embeddings and pruning dimensions at various cutoffs. Filled shapes denote significant differences w.r.t. the baseline, whereas hollow shapes represent non-significant differences. A significant difference followed by a non-significant one at a higher cutoff may result from increased variability as the metric mean decreases, with p-values often just above 0.05.
  • Figure 2: nDCG@10 on DL 19 when applying in-domain PCA computed on $10^3$, $10^4$ and $10^5$ embeddings, to prune embeddings at various cutoffs. Filled shapes denote significant differences, whereas hollow shapes represent non-significant differences. A significant difference followed by a non-significant one at a higher cutoff may result from increased variability as the metric mean decreases, with p-values often just above 0.05.