Selective Query Processing: a Risk-Sensitive Selection of System Configurations
Josiane Mothe, Md Zia Ullah
TL;DR
The paper tackles per-query adaptation of IR system configurations by pre-selecting a small, risk-aware pool of configurations and then assigning the best per query via a similarity-based mapping. It introduces two gain formulations, $E_{Gain}$ and $N_{Gain}$, built from risk and reward terms $E_{Risk},E_{Reward},N_{Risk},N_{Reward}$ with a trade-off parameter $\beta$, to prune configurations greedily. A simple per-query assignment using training-query similarity to select configurations demonstrates strong improvements over single-config baselines and many selective-query baselines across ad hoc and diversity tasks, using a meta-search engine rather than full learning-to-rank over configurations. The approach, evaluated on six IR collections including MS MARCO, shows that a small, well-chosen configuration pool (around 20) can achieve approximately $15\%$ to $20\%$ gains in key metrics, supporting the feasibility of per-query IR systems and guiding future work in transfer learning and richer feature representations. The work provides a practical framework for configuring web-scale IR systems with robust performance while limiting configuration maintenance costs.
Abstract
In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches and these optimized parameters are then used as the system configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible system configurations. This approach was extended recently to include many other parameters, leading to many possible system configurations where the system automatically selects the best configuration on a per-query basis. To determine the ideal configurations to use on a per-query basis in real-world systems we developed a method in which a restricted number of possible configurations is pre-selected and then used in a meta-search engine that decides the best search configuration on a per query basis. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward trade-off between the number of configurations kept, and system effectiveness. For final configuration selection, the decision is based on query feature similarities. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to increase effectiveness by about 15% according(P@10, nDCG@10) when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc-oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.
