Table of Contents
Fetching ...

Mitigating the Participation Bias by Balancing Extreme Ratings

Yongkang Guo, Yuqing Kong, Jialiang Liu

TL;DR

This paper tackles participation bias in rating aggregation by formulating a robust, minimax objective to estimate the underlying mean of ratings when reports are unevenly observed. It introduces two aggregators tailored to the data availability: BEA for the case with known sample size $n$, which imputes unobserved ratings from extreme values and blends them with observed ratings; and PAA for unknown $n$, which asymptotically achieves the minimax optimum by averaging two polarized estimations derived from thresholded histograms. Theoretical results provide lower bounds and optimality claims for BEA and PAA, including asymptotic regret forms and finite-sample guarantees, complemented by visualization and experimental validation on real hotel-rating data. Empirically, BEA and PAA outperform simple averaging and spectral methods in worst-case scenarios, demonstrating practical robustness to participation bias with real-world applicability. The work advances robust aggregation under missing-not-at-random participation and lays groundwork for adaptive, multi-attribute extensions.

Abstract

Rating aggregation plays a crucial role in various fields, such as product recommendations, hotel rankings, and teaching evaluations. However, traditional averaging methods can be affected by participation bias, where some raters do not participate in the rating process, leading to potential distortions. In this paper, we consider a robust rating aggregation task under the participation bias. We assume that raters may not reveal their ratings with a certain probability depending on their individual ratings, resulting in partially observed samples. Our goal is to minimize the expected squared loss between the aggregated ratings and the average of all underlying ratings (possibly unobserved) in the worst-case scenario. We focus on two settings based on whether the sample size (i.e. the number of raters) is known. In the first setting, where the sample size is known, we propose an aggregator, named as the Balanced Extremes Aggregator. It estimates unrevealed ratings with a balanced combination of extreme ratings. When the sample size is unknown, we derive another aggregator, the Polarizing-Averaging Aggregator, which becomes optimal as the sample size grows to infinity. Numerical results demonstrate the superiority of our proposed aggregators in mitigating participation bias, compared to simple averaging and the spectral method. Furthermore, we validate the effectiveness of our aggregators on a real-world dataset.

Mitigating the Participation Bias by Balancing Extreme Ratings

TL;DR

This paper tackles participation bias in rating aggregation by formulating a robust, minimax objective to estimate the underlying mean of ratings when reports are unevenly observed. It introduces two aggregators tailored to the data availability: BEA for the case with known sample size , which imputes unobserved ratings from extreme values and blends them with observed ratings; and PAA for unknown , which asymptotically achieves the minimax optimum by averaging two polarized estimations derived from thresholded histograms. Theoretical results provide lower bounds and optimality claims for BEA and PAA, including asymptotic regret forms and finite-sample guarantees, complemented by visualization and experimental validation on real hotel-rating data. Empirically, BEA and PAA outperform simple averaging and spectral methods in worst-case scenarios, demonstrating practical robustness to participation bias with real-world applicability. The work advances robust aggregation under missing-not-at-random participation and lays groundwork for adaptive, multi-attribute extensions.

Abstract

Rating aggregation plays a crucial role in various fields, such as product recommendations, hotel rankings, and teaching evaluations. However, traditional averaging methods can be affected by participation bias, where some raters do not participate in the rating process, leading to potential distortions. In this paper, we consider a robust rating aggregation task under the participation bias. We assume that raters may not reveal their ratings with a certain probability depending on their individual ratings, resulting in partially observed samples. Our goal is to minimize the expected squared loss between the aggregated ratings and the average of all underlying ratings (possibly unobserved) in the worst-case scenario. We focus on two settings based on whether the sample size (i.e. the number of raters) is known. In the first setting, where the sample size is known, we propose an aggregator, named as the Balanced Extremes Aggregator. It estimates unrevealed ratings with a balanced combination of extreme ratings. When the sample size is unknown, we derive another aggregator, the Polarizing-Averaging Aggregator, which becomes optimal as the sample size grows to infinity. Numerical results demonstrate the superiority of our proposed aggregators in mitigating participation bias, compared to simple averaging and the spectral method. Furthermore, we validate the effectiveness of our aggregators on a real-world dataset.

Paper Structure

This paper contains 31 sections, 5 theorems, 12 equations, 14 figures, 1 table.

Key Result

lemma 1

Consider a pair of information structures: where $q$ is the lower bound of the participation probability, and $a\in[0,1]$ is a parameter. For any aggregator $f$, the regret has a lower bound

Figures (14)

  • Figure 1: The observed ratings of two hotels.
  • Figure 2: The true ratings of two hotels.
  • Figure 3: Illustration of the Balanced Extreme Aggregator (BEA) when the sample size $n$ is known. BEA estimates the expected ratings of those unreported raters based on the difference between $n_1$ and $n_m$, the counts of extreme ratings. Then BEA adjusts the observed rating average with the estimated unobserved rating average.
  • Figure 4: Illustration of the Polarizing-Averaging Aggregator (PAA) when the sample size $n$ is unknown. We create two modified histograms from the original observed data. For the first histogram, we identify a threshold $k_1$, and only keep $q$ fraction of the counts for ratings above $k_1$. In the second histogram, we identify a threshold $k_2$, and only keep $q$ fraction of the counts for ratings below $k_2$. We then calculate the empirical mean for each of these adjusted histograms and output their average.
  • Figure 5: Simple averaging vs. BEA for different sample size $n$ and the number of rating categories $m$. The x-axis is the lower bound of the participation probability, $q$, and the y-axis is the natural logarithm of the regret. The regret of BEA almost matches the theoretical lower bound for a wide range of $q$.
  • ...and 9 more figures

Theorems & Definitions (8)

  • Example 1.1: Why direct average is not good
  • Definition 3.1: Balanced Extremes Aggregator
  • lemma 1: Lower Bound of the Regret
  • Corollary 3.2
  • Definition 4.1: Polarizing-Averaging Aggregator
  • Theorem 4.2
  • proposition 1
  • Theorem 4.3