Table of Contents
Fetching ...

Neural Passage Quality Estimation for Static Pruning

Xuejun Chang, Debabrata Mishra, Craig Macdonald, Sean MacAvaney

TL;DR

This work tackles the problem of reducing neural search costs by pruning passages that are unlikely to satisfy any user query. It formalizes a query-agnostic passage quality signal and compares multiple estimators, finding that a supervised QT5-based approach provides the strongest, most consistent pruning signal. Across several pipelines (lexical, dense, learned sparse, and re-ranking), pruning 25–30% of passages yields statistically equivalent retrieval effectiveness, while also reducing indexing and retrieval costs; smaller QT5 variants further improve efficiency with minimal loss in performance. The study demonstrates transferability to larger corpora and different domains (MSMARCO v2, CORD-19) and discusses practical implications for energy efficiency and cost in AI-powered search, setting the stage for learning-what-to-index strategies and more integrated document/segment pruning.

Abstract

Neural networks -- especially those that use large, pre-trained language models -- have improved search engines in various ways. Most prominently, they can estimate the relevance of a passage or document to a user's query. In this work, we depart from this direction by exploring whether neural networks can effectively predict which of a document's passages are unlikely to be relevant to any query submitted to the search engine. We refer to this query-agnostic estimation of passage relevance as a passage's quality. We find that our novel methods for estimating passage quality allow passage corpora to be pruned considerably while maintaining statistically equivalent effectiveness; our best methods can consistently prune >25% of passages in a corpora, across various retrieval pipelines. Such substantial pruning reduces the operating costs of neural search engines in terms of computing resources, power usage, and carbon footprint -- both when processing queries (thanks to a smaller index size) and when indexing (lightweight models can prune low-quality passages prior to the costly dense or learned sparse encoding step). This work sets the stage for developing more advanced neural "learning-what-to-index" methods.

Neural Passage Quality Estimation for Static Pruning

TL;DR

This work tackles the problem of reducing neural search costs by pruning passages that are unlikely to satisfy any user query. It formalizes a query-agnostic passage quality signal and compares multiple estimators, finding that a supervised QT5-based approach provides the strongest, most consistent pruning signal. Across several pipelines (lexical, dense, learned sparse, and re-ranking), pruning 25–30% of passages yields statistically equivalent retrieval effectiveness, while also reducing indexing and retrieval costs; smaller QT5 variants further improve efficiency with minimal loss in performance. The study demonstrates transferability to larger corpora and different domains (MSMARCO v2, CORD-19) and discusses practical implications for energy efficiency and cost in AI-powered search, setting the stage for learning-what-to-index strategies and more integrated document/segment pruning.

Abstract

Neural networks -- especially those that use large, pre-trained language models -- have improved search engines in various ways. Most prominently, they can estimate the relevance of a passage or document to a user's query. In this work, we depart from this direction by exploring whether neural networks can effectively predict which of a document's passages are unlikely to be relevant to any query submitted to the search engine. We refer to this query-agnostic estimation of passage relevance as a passage's quality. We find that our novel methods for estimating passage quality allow passage corpora to be pruned considerably while maintaining statistically equivalent effectiveness; our best methods can consistently prune >25% of passages in a corpora, across various retrieval pipelines. Such substantial pruning reduces the operating costs of neural search engines in terms of computing resources, power usage, and carbon footprint -- both when processing queries (thanks to a smaller index size) and when indexing (lightweight models can prune low-quality passages prior to the costly dense or learned sparse encoding step). This work sets the stage for developing more advanced neural "learning-what-to-index" methods.
Paper Structure (22 sections, 2 equations, 5 figures, 3 tables)

This paper contains 22 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Example from msmarco_doc_01_1225433927 showing that not all passages within a document are necessarily valuable.
  • Figure 2: ROC curves for each passage quality estimator, based on a union of all relevant documents in the full MSMARCO dev set, DL 2019, and DL 2020 and excluding all relevant passages from the train set. The figure details the range [0.8,1.0], thereby focussing on the passages most likely to be pruned. The AUC for each estimator is in the legend.
  • Figure 3: Precision-oriented retrieval effectiveness on four pipelines by the percentage of a corpus pruned using each quality estimator. Effectiveness measurements that are statistically equivalent to the unpruned passage corpus are marked with $\medbullet$. Note that the vertical axis of each plot are scaled to emphasise the effect on each individual model.
  • Figure 4: Precision-oriented pruning effectiveness of three supervised QT5 model sizes on four pipelines. Effectiveness measurements that are statistically equivalent to the unpruned passage corpus are marked with $\medbullet$. Note that the vertical axis of each plot are scaled to emphasise the effect on each individual model.
  • Figure 5: Transferability of QT-5-Tiny to two other datasets: MSMARCO v2 (TREC DL 21&22) and CORD19 (TREC COVID). Effectiveness measurements that are statistically equivalent effectiveness to the unpruned corpus are marked with $\medbullet$. Note that the vertical axis of each plot are scaled to emphasise the effect on each individual model.