Towards Group-aware Search Success

Haolun Wu; Bhaskar Mitra; Nick Craswell

Towards Group-aware Search Success

Haolun Wu, Bhaskar Mitra, Nick Craswell

TL;DR

GA-SS redefines search success to ensure that all demographic groups achieve satisfaction from search outcomes, and introduces a comprehensive mathematical framework to calculate it, incorporating both static and stochastic ranking policies and integrating user browsing models for a more accurate assessment.

Abstract

Traditional measures of search success often overlook the varying information needs of different demographic groups. To address this gap, we introduce a novel metric, named Group-aware Search Success (GA-SS). GA-SS redefines search success to ensure that all demographic groups achieve satisfaction from search outcomes. We introduce a comprehensive mathematical framework to calculate GA-SS, incorporating both static and stochastic ranking policies and integrating user browsing models for a more accurate assessment. In addition, we have proposed Group-aware Most Popular Completion (gMPC) ranking model to account for demographic variances in user intent, aligning more closely with the diverse needs of all user groups. We empirically validate our metric and approach with two real-world datasets: one focusing on query auto-completion and the other on movie recommendations, where the results highlight the impact of stochasticity and the complex interplay among various search success metrics. Our findings advocate for a more inclusive approach in measuring search success, as well as inspiring future investigations into the quality of service of search.

Towards Group-aware Search Success

TL;DR

Abstract

Paper Structure (23 sections, 9 equations, 5 figures)

This paper contains 23 sections, 9 equations, 5 figures.

Introduction
Related Work
Diversity in Search
Fairness in Search
Ranking with Stochastic Policy
Group-aware Search Success
Group-aware Search Success within Query
Group-aware Search Success across Queries
Ranking with Static and Stochastic Policies
Static ranking policy.
Stochastic ranking policy.
Metric Comparison
Experiment and Analysis
Task and Dataset
Query Auto-completion.
...and 8 more sections

Figures (5)

Figure 1: Two motivation examples to show that previous search success measures cannot distinguish certain nuances. Each edge in the figure between the query ($q$) and intent ($t$) carries equal weight, signifying that the query is uniformly relevant to the connected intents. Similarly, the edges linking the user group ($g$) to the intent ($t$) have equal weight within each group, indicating that members of the group have a uniform level of interest in the associated intent.
Figure 2: A toy example for the GA-SS metric comparison. Two queries $q_1$ and $q_2$ have equal sampling probability, where each query is equally relevant to two intents $t_1$ and $t_2$. Two searcher groups are of equal size, where group $g_A$ is always interested in $t_1$ and group $g_B$ is always interested in $t_2$. In practice, a small positive value should be added on the success for a smoothing to avoid zero. We ignore it in this toy example for simplicity. We observe that the patterns of change across each metric variant do not consistently align, which suggests that each metric variant captures different aspects of search success.
Figure 3: Behavior of different metrics for a stochastic ranking policy---generated by randomizing the MPC/MPV and gMPC/gMPV models using Plackett-Luce. The first row shows the impact of different stochasticity the query auto-completion, while the second row shows the result on movie recommendation. For consistency, we normalize each of the metric values between 0 and 1 using min-max normalization in each subfigure. The x-axis shows the values of $\beta$, where a larger value indicates more randomization.
Figure 4: A case study on impact of stochasticity on different ranking models. We use the movie recommendation as an example and report the top-10 ranked movies with respect to the query (director): "John Carpenter". As shown above, when adding some moderate amount of stochasticity to the ranking model, the success and fairness both improve thus leading to increased metric values (e.g., GA-SS). However, when a large amount of stochasticity being added, rankings from both models converge to a random ranking, leading to decreased metric values.
Figure 5: The Kendall rank correlation between different metrics on the two tasks and datasets we studied.

Towards Group-aware Search Success

TL;DR

Abstract

Towards Group-aware Search Success

Authors

TL;DR

Abstract

Table of Contents

Figures (5)