Dynamic Superblock Pruning for Fast Learned Sparse Retrieval
Parker Carlson, Wentai Xie, Shanxiu He, Tao Yang
TL;DR
The paper addresses the challenge of fast sparse retrieval with learned representations on CPU by introducing Dynamic Superblock Pruning (SP), which adds a superblock-level pruning layer to the traditional block-based traversal. SP precomputes per-block maxima $W_{B,t}$, per-superblock maxima $W_{X,t}$ and averages $ar{W}_{X,t}$ offline and uses online bounds $SBMax(X)=∑_{t∈Q} q_t W_{X,t}$ and $ar{SBMax}(X)=∑_{t∈Q} q_t ar{W}_{X,t}$ to prune superblocks with $SBMax(X)≤θ/μ$ and $ar{SBMax}(X)≤θ/η$, followed by block-level pruning via $BoundSum(B)≤θ/η$. During inference, unpruned blocks are scored in descending order of $BoundSum$, and a cache-optimized superblock-at-a-time traversal (SaaT) yields major speedups at the cost of additional storage for the superblock statistics. Empirical results on MS MARCO with SPLADE and E-SPLADE show SP substantially faster than state-of-the-art baselines (BMP, ASC, Seismic) under high-relevance budgets, with notable gains from the SaaT traversal and robust performance across parameter settings. The work highlights SP as a viable path to fast, high-relevance sparse retrieval on commodity CPUs and outlines directions for integrating SP with index compression and complementary pruning techniques.
Abstract
This paper proposes superblock pruning (SP) during top-k online document retrieval for learned sparse representations. SP structures the sparse index as a set of superblocks on a sequence of document blocks and conducts a superblock-level selection to decide if some superblocks can be pruned before visiting their child blocks. SP generalizes the previous flat block or cluster-based pruning, allowing the early detection of groups of documents that cannot or are less likely to appear in the final top-k list. SP can accelerate sparse retrieval in a rank-safe or approximate manner under a high-relevance competitiveness constraint. Our experiments show that the proposed scheme significantly outperforms state-of-the-art baselines on MS MARCO passages on a single-threaded CPU.
