Table of Contents
Fetching ...

Towards Uncertainty Unification: A Case Study for Preference Learning

Shaoting Peng, Haonan Chen, Katherine Driggs-Campbell

TL;DR

This work addresses the challenge of learning human preferences in the presence of both human and robot uncertainties. It introduces uncertainty-unified preference learning (UUPL), which jointly models discrete human uncertainty levels and integrates them into GP-based preference learning via a Laplace-approximate posterior mean and an uncertainty-weighted GMM for predictive variance. A user-specific calibration procedure aligns uncertainty representations across users, enabling consistent performance. Across simulations and real-user studies, UUPL achieves higher prediction accuracy, faster convergence, and more stable results than baselines, demonstrating the value of explicit uncertainty unification for safer and more interpretable human–robot interactions.

Abstract

Learning human preferences is essential for human-robot interaction, as it enables robots to adapt their behaviors to align with human expectations and goals. However, the inherent uncertainties in both human behavior and robotic systems make preference learning a challenging task. While probabilistic robotics algorithms offer uncertainty quantification, the integration of human preference uncertainty remains underexplored. To bridge this gap, we introduce uncertainty unification and propose a novel framework, uncertainty-unified preference learning (UUPL), which enhances Gaussian Process (GP)-based preference learning by unifying human and robot uncertainties. Specifically, UUPL includes a human preference uncertainty model that improves GP posterior mean estimation, and an uncertainty-weighted Gaussian Mixture Model (GMM) that enhances GP predictive variance accuracy. Additionally, we design a user-specific calibration process to align uncertainty representations across users, ensuring consistency and reliability in the model performance. Comprehensive experiments and user studies demonstrate that UUPL achieves state-of-the-art performance in both prediction accuracy and user rating. An ablation study further validates the effectiveness of human uncertainty model and uncertainty-weighted GMM of UUPL.

Towards Uncertainty Unification: A Case Study for Preference Learning

TL;DR

This work addresses the challenge of learning human preferences in the presence of both human and robot uncertainties. It introduces uncertainty-unified preference learning (UUPL), which jointly models discrete human uncertainty levels and integrates them into GP-based preference learning via a Laplace-approximate posterior mean and an uncertainty-weighted GMM for predictive variance. A user-specific calibration procedure aligns uncertainty representations across users, enabling consistent performance. Across simulations and real-user studies, UUPL achieves higher prediction accuracy, faster convergence, and more stable results than baselines, demonstrating the value of explicit uncertainty unification for safer and more interpretable human–robot interactions.

Abstract

Learning human preferences is essential for human-robot interaction, as it enables robots to adapt their behaviors to align with human expectations and goals. However, the inherent uncertainties in both human behavior and robotic systems make preference learning a challenging task. While probabilistic robotics algorithms offer uncertainty quantification, the integration of human preference uncertainty remains underexplored. To bridge this gap, we introduce uncertainty unification and propose a novel framework, uncertainty-unified preference learning (UUPL), which enhances Gaussian Process (GP)-based preference learning by unifying human and robot uncertainties. Specifically, UUPL includes a human preference uncertainty model that improves GP posterior mean estimation, and an uncertainty-weighted Gaussian Mixture Model (GMM) that enhances GP predictive variance accuracy. Additionally, we design a user-specific calibration process to align uncertainty representations across users, ensuring consistency and reliability in the model performance. Comprehensive experiments and user studies demonstrate that UUPL achieves state-of-the-art performance in both prediction accuracy and user rating. An ablation study further validates the effectiveness of human uncertainty model and uncertainty-weighted GMM of UUPL.

Paper Structure

This paper contains 34 sections, 19 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Intuition on uncertainty unification for preference learning. Imagine a robot inferring Alice's (a human user) ideal trajectory for passing a cup of coffee above a table using preference learning. In one sample pair, trajectory $x^{(1)}$ poses a risk of spilling coffee on the keyboard, while trajectory $x^{(2)}$ risks spilling it on the headphones. The keyboard and headphones are both valuable to Alice, so she responds that she weakly prefers $x^{(2)}$ with hesitation, reflecting her uncertainty in the decision. An uncertainty-averse model ignores this nuance, potentially learning a suboptimal and undesirable trajectory (e.g., still passing above the headphones). In contrast, an uncertainty-unified model incorporates Alice’s expressed uncertainty into its uncertainty-aware framework, enabling it to learn an ideal trajectory that aligns with her true preferences.
  • Figure 2: Overview of UUPL. Imagine a robot inferring a user's preferred room temperature. For each query, we collect the user's preference with the associated uncertainty level. To begin, a calibration process (blue box) interprets the user's definitions of "confident" and "uncertain", ensuring these subjective assessments are accurately quantified with uncertainty factors $u$. Then, we construct the human preference uncertainty model as a probit model using Gaussian CDF, with the calibrated $u$ as the standard deviation (left part of purple box). This model improves the GP mean estimation accuracy (right part of purple box). Additionally, we introduce a weighted GMM (left part of red box) to adaptively scale the GP predictive variance (right part of red box) based on the human uncertainty level, enhancing its interpretability. Through this approach, UUPL effectively integrates human uncertainty into both the GP mean and variance, achieving comprehensive uncertainty unification, and thus provides a more accurate, interpretable, and user-aligned learning result (rightmost picture).
  • Figure 3: Intuition behind our human preference uncertainty modeling. The x-axis represents the reward residual $R(O^{(1)}) - R(O^{(2)})$, while the y-axis indicates the probability of selecting $O^{(1)}$. For a given query (marked by the red dashed line), confident human choices (low uncertainty level $l$ / small $u$) correspond to high probabilities, represented by the intersection between the red dashed line and the green CDF with $u = 0.1$. Conversely, uncertain human choices (high $l$ / large $u$) lower the modeled probability, as shown by the intersection between the red dashed line and the yellow CDF with $u = 3$.
  • Figure 4: Relationship between Laplace posterior mean difference $\boldsymbol{\Delta f_{\text{Lap}}}$ and human uncertainty $\boldsymbol{u}$.$u^1, u^2, u^3, u^4$ are determined by varying the posterior mean difference proportionally, ensuring the model taking human uncertainty and producing more accurate estimated posterior mean.
  • Figure 5: GP variance visualizations. The ground truth function and the learned GPs (mean $\pm$$1.96\times$std) of three baseline methods and UUPL are provided. The data comes from comparisons of 19°C with all other integer temperatures, and the results showcase the rationality of our learned variance.
  • ...and 7 more figures