ECLIPSE: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval
Giulio D'Erasmo, Giovanni Trappolini, Nicola Tonellotto, Fabrizio Silvestri
TL;DR
This work tackles the problem of noisy and non-discriminative dimensions in high-dimensional dense embeddings for information retrieval. It introduces Eclipse, a contrastive dimension importance estimator that utilizes both top (relevant) and bottom (irrelevant) retrieved documents to form sun and moon representations, yielding a residual dimension importance vector $u^{\text{Eclipse}}_q = \alpha (q \odot s) - \beta (q \odot m)$ (equivalently $u^{\text{Eclipse}}_q = \mathbf{q} \odot (\alpha \mathbf{s} - \beta \mathbf{m})$). The method can be plugged into existing DIMEs (PRF or LLM-based) and consistently improves retrieval performance across four benchmarks and three base models, with average AP gains up to $19.50\%$ (or $22.35\%$) and $\text{nDCG@10}$ gains up to $11.42\%$ (or $13.10\%$). Key findings show that highly irrelevant documents are valuable contrast signals, and semantic content of the irrelevant texts is less crucial than their low relevance, supporting a broader, pseudo-irrelevance-based approach for robust dense retrieval.
Abstract
Recent advances in Information Retrieval have leveraged high-dimensional embedding spaces to improve the retrieval of relevant documents. Moreover, the Manifold Clustering Hypothesis suggests that despite these high-dimensional representations, documents relevant to a query reside on a lower-dimensional, query-dependent manifold. While this hypothesis has inspired new retrieval methods, existing approaches still face challenges in effectively separating non-relevant information from relevant signals. We propose a novel methodology that addresses these limitations by leveraging information from both relevant and non-relevant documents. Our method, ECLIPSE, computes a centroid based on irrelevant documents as a reference to estimate noisy dimensions present in relevant ones, enhancing retrieval performance. Extensive experiments on three in-domain and one out-of-domain benchmarks demonstrate an average improvement of up to 19.50% (resp. 22.35%) in mAP(AP) and 11.42% (resp. 13.10%) in nDCG@10 w.r.t. the DIME-based baseline (resp. the baseline using all dimensions). Our results pave the way for more robust, pseudo-irrelevance-based retrieval systems in future IR research.
