Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

Shreyas Bhat; Joseph B. Lyons; Cong Shi; X. Jessie Yang

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

Shreyas Bhat, Joseph B. Lyons, Cong Shi, X. Jessie Yang

TL;DR

The paper investigates how dynamic value alignment between humans and robots affects trust in human–robot interaction, challenging the assumption that full alignment is always beneficial. It couples a simulation framework—a trust-aware MDP with Bayesian IRL for learning human reward weights—with a human-subject study in a high-risk ISR-like task, demonstrating that value alignment improves trust primarily under high risk while adaptive alignment can sustain trust across diverse human preferences. The main contributions are (i) a formal trust-aware planning model that integrates personalized value weighting, (ii) an adaptive learner that matches robot rewards to human preferences in real time, and (iii) empirical validation showing improved trust, agreement, and reduced workload when adaptation is used in high-risk scenarios. The findings inform the design of decision-support robots in safety-critical domains, suggesting that robots should balance alignment with trust dynamics rather than pursue full immediate alignment in all contexts.

Abstract

With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assumption that aligning the values of the robot with that of the human benefits the team. This assumption, however, has not been empirically verified. Moreover, prior literature does not account for human's trust in the robot when analyzing human-robot value alignment. Thus, a research gap needs to be bridged by answering two questions: How does alignment of values affect trust? Is it always beneficial to align the robot's values with that of the human? We present a simulation study and a human-subject study to answer these questions. Results from the simulation study show that alignment of values is important for trust when the overall risk level of the task is high. We also present an adaptive strategy for the robot that uses Inverse Reinforcement Learning (IRL) to match the values of the robot with those of the human during interaction. Our simulations suggest that such an adaptive strategy is able to maintain trust across the full spectrum of human values. We also present results from an empirical study that validate these findings from simulation. Results indicate that real-time personalized value alignment is beneficial to trust and perceived performance by the human when the robot does not have a good prior on the human's values.

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

TL;DR

Abstract

Paper Structure (36 sections, 11 equations, 8 figures, 1 table)

This paper contains 36 sections, 11 equations, 8 figures, 1 table.

Introduction
Related Work
Trust-Aware Decision-Making
Value Alignment
Simulation Study
Human-Robot Teaming Task
Trust-aware Markov Decision Process
States
Actions
Rewards
Transition Model
Human Trust-Behavior Model
Value Iteration
Bayesian Inverse Reinforcement Learning
Threats and Threat Levels
...and 21 more sections

Figures (8)

Figure 1: The observed regions in end-of-mission trust as a function of the health weights of the human $w^h_h$ and the robot $w^r_h$. The figure on the left is the simulation result when there is a relatively low chance of threat presence at any search site $(d=0.3)$. The figure on the right is when there is a higher chance of threat presence at any search site $(d=0.7)$.
Figure 2: The effect of the prior probability of threat presence in any house $d$ on the end-of-mission trust, after fixing a trust region.
Figure 3: The effect of prior probability of threat presence in any house $d$ on the end-of-mission trust when the human and the robot are both extremely risk-averse
Figure 4: Comparing the adaptive strategy with the non-adaptive strategy in the end-of-mission trust feedback given by the simulated human for two different levels of threat. The non-adaptive strategy sets the robot's health reward weight to $0.5$
Figure 5: Comparing the adaptive strategy with the non-adaptive strategy in the end-of-mission trust feedback given by the simulated human for two different levels of threat. The non-adaptive strategy sets the robot's health reward weight to $0.5$
...and 3 more figures

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

TL;DR

Abstract

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

Authors

TL;DR

Abstract

Table of Contents

Figures (8)