Neural Passage Quality Estimation for Static Pruning
Xuejun Chang, Debabrata Mishra, Craig Macdonald, Sean MacAvaney
TL;DR
This work tackles the problem of reducing neural search costs by pruning passages that are unlikely to satisfy any user query. It formalizes a query-agnostic passage quality signal and compares multiple estimators, finding that a supervised QT5-based approach provides the strongest, most consistent pruning signal. Across several pipelines (lexical, dense, learned sparse, and re-ranking), pruning 25–30% of passages yields statistically equivalent retrieval effectiveness, while also reducing indexing and retrieval costs; smaller QT5 variants further improve efficiency with minimal loss in performance. The study demonstrates transferability to larger corpora and different domains (MSMARCO v2, CORD-19) and discusses practical implications for energy efficiency and cost in AI-powered search, setting the stage for learning-what-to-index strategies and more integrated document/segment pruning.
Abstract
Neural networks -- especially those that use large, pre-trained language models -- have improved search engines in various ways. Most prominently, they can estimate the relevance of a passage or document to a user's query. In this work, we depart from this direction by exploring whether neural networks can effectively predict which of a document's passages are unlikely to be relevant to any query submitted to the search engine. We refer to this query-agnostic estimation of passage relevance as a passage's quality. We find that our novel methods for estimating passage quality allow passage corpora to be pruned considerably while maintaining statistically equivalent effectiveness; our best methods can consistently prune >25% of passages in a corpora, across various retrieval pipelines. Such substantial pruning reduces the operating costs of neural search engines in terms of computing resources, power usage, and carbon footprint -- both when processing queries (thanks to a smaller index size) and when indexing (lightweight models can prune low-quality passages prior to the costly dense or learned sparse encoding step). This work sets the stage for developing more advanced neural "learning-what-to-index" methods.
