Table of Contents
Fetching ...

Selective Query Processing: a Risk-Sensitive Selection of System Configurations

Josiane Mothe, Md Zia Ullah

TL;DR

The paper tackles per-query adaptation of IR system configurations by pre-selecting a small, risk-aware pool of configurations and then assigning the best per query via a similarity-based mapping. It introduces two gain formulations, $E_{Gain}$ and $N_{Gain}$, built from risk and reward terms $E_{Risk},E_{Reward},N_{Risk},N_{Reward}$ with a trade-off parameter $\beta$, to prune configurations greedily. A simple per-query assignment using training-query similarity to select configurations demonstrates strong improvements over single-config baselines and many selective-query baselines across ad hoc and diversity tasks, using a meta-search engine rather than full learning-to-rank over configurations. The approach, evaluated on six IR collections including MS MARCO, shows that a small, well-chosen configuration pool (around 20) can achieve approximately $15\%$ to $20\%$ gains in key metrics, supporting the feasibility of per-query IR systems and guiding future work in transfer learning and richer feature representations. The work provides a practical framework for configuring web-scale IR systems with robust performance while limiting configuration maintenance costs.

Abstract

In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches and these optimized parameters are then used as the system configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible system configurations. This approach was extended recently to include many other parameters, leading to many possible system configurations where the system automatically selects the best configuration on a per-query basis. To determine the ideal configurations to use on a per-query basis in real-world systems we developed a method in which a restricted number of possible configurations is pre-selected and then used in a meta-search engine that decides the best search configuration on a per query basis. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward trade-off between the number of configurations kept, and system effectiveness. For final configuration selection, the decision is based on query feature similarities. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to increase effectiveness by about 15% according(P@10, nDCG@10) when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc-oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.

Selective Query Processing: a Risk-Sensitive Selection of System Configurations

TL;DR

The paper tackles per-query adaptation of IR system configurations by pre-selecting a small, risk-aware pool of configurations and then assigning the best per query via a similarity-based mapping. It introduces two gain formulations, and , built from risk and reward terms with a trade-off parameter , to prune configurations greedily. A simple per-query assignment using training-query similarity to select configurations demonstrates strong improvements over single-config baselines and many selective-query baselines across ad hoc and diversity tasks, using a meta-search engine rather than full learning-to-rank over configurations. The approach, evaluated on six IR collections including MS MARCO, shows that a small, well-chosen configuration pool (around 20) can achieve approximately to gains in key metrics, supporting the feasibility of per-query IR systems and guiding future work in transfer learning and richer feature representations. The work provides a practical framework for configuring web-scale IR systems with robust performance while limiting configuration maintenance costs.

Abstract

In information retrieval systems, search parameters are optimized to ensure high effectiveness based on a set of past searches and these optimized parameters are then used as the system configuration for all subsequent queries. A better approach, however, would be to adapt the parameters to fit the query at hand. Selective query expansion is one such an approach, in which the system decides automatically whether or not to expand the query, resulting in two possible system configurations. This approach was extended recently to include many other parameters, leading to many possible system configurations where the system automatically selects the best configuration on a per-query basis. To determine the ideal configurations to use on a per-query basis in real-world systems we developed a method in which a restricted number of possible configurations is pre-selected and then used in a meta-search engine that decides the best search configuration on a per query basis. We define a risk-sensitive approach for configuration pre-selection that considers the risk-reward trade-off between the number of configurations kept, and system effectiveness. For final configuration selection, the decision is based on query feature similarities. We find that a relatively small number of configurations (20) selected by our risk-sensitive model is sufficient to increase effectiveness by about 15% according(P@10, nDCG@10) when compared to traditional grid search using a single configuration and by about 20% when compared to learning to rank documents. Our risk-sensitive approach works for both diversity- and ad hoc-oriented searches. Moreover, the similarity-based selection method outperforms the more sophisticated approaches. Thus, we demonstrate the feasibility of developing per-query information retrieval systems, which will guide future research in this direction.
Paper Structure (25 sections, 14 equations, 8 figures, 10 tables, 3 algorithms)

This paper contains 25 sections, 14 equations, 8 figures, 10 tables, 3 algorithms.

Figures (8)

  • Figure 1: Various system configurations are more effective for some queries than for others. The plot illustrates the effectiveness (y-axis) of three configurations ($c_{1}$, $c_{2}$, and $c_{3}$) for seven queries (x-axis). Whereas $c_{2}$ is best for most queries, $c_{1}$ is best for query 4, and $c_{3}$ is best for query 3. If a single configuration was to be used for all queries, then $c_{2}$ should be selected. The best possible effectiveness, however, would be to select the best configuration for each query (red circles).
  • Figure 2: The training phase of model is composed of three phases: (1) configurations are generated using different values of the query processing parameters, (2) the set of configurations is restricted using a risk-reward criterion considering the training queries to obtain the candidate configurations for the next step, and (3) the best configuration is selected on a per-query basis for each training query.
  • Figure 3: Effectiveness grows with $k$, the number of configurations. The number of configurations $k$ in 1 to 30 (X-axis) and nDCG@10 (Y-axis) for ad hoc collection on (a) or ERR-IA@20 (Y-axis) for diversity collections on (b). Results are for $E_{Risk}$ function.
  • Figure 4: Number of queries for which a given configuration is predicted. Bar plots for the 20 configurations chosen by (a) $E_{Risk}$-Cosine, and (b) $N_{Risk}$-Cosine; GOV2 using AP.
  • Figure 5: Configurations selected by $E_{Risk}$ are more robust to treat any query. Heatmap plots for the 20 selected configurations selected by (a) $E_{Risk}$, (b) randomly; GOV2 using AP.
  • ...and 3 more figures