Learning Human Preferences Over Robot Behavior as Soft Planning Constraints

Austin Narcomey; Nathan Tsoi; Ruta Desai; Marynel Vázquez

Learning Human Preferences Over Robot Behavior as Soft Planning Constraints

Austin Narcomey, Nathan Tsoi, Ruta Desai, Marynel Vázquez

TL;DR

The paper addresses learning human preferences over robot behavior expressed as soft planning constraints while keeping task-level hard constraints fixed. It introduces a planning-based formulation where user preferences are organized into $N$ sub-preferences within $igoplus_{n=1}^N \mathcal{P}^n$, each contributing a soft cost to planning via $\mathbf{c}^n \in [-1,1]$, and learns these preferences from binary trajectory queries using a neural network with $N$ heads and uncertainty modeled by $\mathrm{IP}(v^n|\mathcal{S})$. Evaluations in Habitat 2.0 on a rearrangement task show that models can infer multi-objective preferences under noisy feedback, with distribution-based supervision and modest training noise improving robustness and generalization across noise levels. The work demonstrates a scalable, interpretable approach to learning soft planning constraints that can guide human-aligned robot planning and motivates future real-world testing and online querying. Overall, this study lays groundwork for adaptive planning-based robot behavior driven by learned soft constraints that reflect user preferences.

Abstract

Preference learning has long been studied in Human-Robot Interaction (HRI) in order to adapt robot behavior to specific user needs and desires. Typically, human preferences are modeled as a scalar function; however, such a formulation confounds critical considerations on how the robot should behave for a given task, with desired -- but not required -- robot behavior. In this work, we distinguish between such required and desired robot behavior by leveraging a planning framework. Specifically, we propose a novel problem formulation for preference learning in HRI where various types of human preferences are encoded as soft planning constraints. Then, we explore a data-driven method to enable a robot to infer preferences by querying users, which we instantiate in rearrangement tasks in the Habitat 2.0 simulator. We show that the proposed approach is promising at inferring three types of preferences even under varying levels of noise in simulated user choices between potential robot behaviors. Our contributions open up doors to adaptable planning-based robot behavior in the future.

Learning Human Preferences Over Robot Behavior as Soft Planning Constraints

TL;DR

sub-preferences within

, each contributing a soft cost to planning via

, and learns these preferences from binary trajectory queries using a neural network with

heads and uncertainty modeled by

. Evaluations in Habitat 2.0 on a rearrangement task show that models can infer multi-objective preferences under noisy feedback, with distribution-based supervision and modest training noise improving robustness and generalization across noise levels. The work demonstrates a scalable, interpretable approach to learning soft planning constraints that can guide human-aligned robot planning and motivates future real-world testing and online querying. Overall, this study lays groundwork for adaptive planning-based robot behavior driven by learned soft constraints that reflect user preferences.

Abstract

Paper Structure (14 sections, 2 equations, 3 figures, 1 table)

This paper contains 14 sections, 2 equations, 3 figures, 1 table.

Introduction
Related Work
Proposed Approach
Problem Setup
Predicting Preferences
Model Supervision
Evaluation Setup
Experimental Setup
Preference Prediction Models
Neural Network Training Details
Results
Discussion
Limitations & Future Work
Conclusion

Figures (3)

Figure 1: Illustrations of scenarios in which differentiating hard and soft task constraints can be beneficial for robot behavior generation.
Figure 2: Overview of interaction setup and approach. a) A query has trajectories A and B that show the robot completing the task of interest and a user's choice. b) Preference prediction problem. The robot's goal is to identify the user's preferences $\mathcal{V}^\mathcal{H}$ given a sequence of queries.
Figure 3: The proposed dataset processing pipeline used to generate our disjoint train, val, and test datasets. See Sec. \ref{['ssec:supervision']} and \ref{['ssec:preference-prediction']} for details.

Learning Human Preferences Over Robot Behavior as Soft Planning Constraints

TL;DR

Abstract

Learning Human Preferences Over Robot Behavior as Soft Planning Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (3)