Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Dimitris Papadimitriou; Daniel S. Brown

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Dimitris Papadimitriou, Daniel S. Brown

TL;DR

The paper tackles the challenge of inferring environmental constraints for safe policy learning by reframing constraint discovery as a Bayesian preference-learning problem. It extends preference modeling with margin-aware, grouped demonstrations to capture varying constraint severities, and introduces PBICRL, a Monte Carlo method that infers constraint weights and indicator variables (and, in extensions, unknown feature parameters) without requiring policy re-optimization at each step. Empirical results across 2D point mass, Fetch-Reach, HalfCheetah, and Ant demonstrate that PBICRL more accurately and efficiently recovers constraints than state-of-the-art baselines, including when constraint features are unknown or parametric. The approach provides uncertainty quantification via the posterior and supports active learning and safer policy synthesis, with future directions toward higher-dimensional tasks and representation learning for constraint features.

Abstract

It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

TL;DR

Abstract

Paper Structure (26 sections, 14 equations, 10 figures, 5 tables, 3 algorithms)

This paper contains 26 sections, 14 equations, 10 figures, 5 tables, 3 algorithms.

Introduction
Background and Related Work
Learning from Preferences
Limitations of Learning from Preferences
Margin-Respecting Preference Learning
Learning from Grouped Preferences
Preference-Based Constraint Inference
Constraint Inference with Known Features
Constraint Inference with Unknown Features
Experiments
Point Mass Environment
Fetch-Reach Robot
HalfCheetah and Ant with Parametric Constraints
Conclusion and Future Directions
PBICRL: Unknown features
...and 11 more sections

Figures (10)

Figure 1: (\ref{['fig:fig_example1']}) Example of three different types of trajectories. A preferable (green), a bad (orange) and a slightly worse than orange (red). (\ref{['fig:bar_plot_ex']}) Weight values obtained from BPL and BPL with margins.
Figure 2: Illustrative example of grouped rankings of demonstrations with different inter-class margins.
Figure 3: 2D point mass navigational environment (\ref{['fig:2d_env_2']}) and Fetch-Reach robot (\ref{['fig:fetch_env_2']}) environments.
Figure 4: (\ref{['fig:2d_traj']}) 2D point mass environment demonstrations. (\ref{['fig:2d_results']}) Inference results for point mass environment. Results averaged over $5$ seeds.
Figure 5: (\ref{['fig:fetch_traj']}) Fetch-Reach robot demonstrations. (\ref{['fig:fetch_res']}) Inference results for Fetch-Reach environment. Results averaged over $5$ seeds.
...and 5 more figures

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

TL;DR

Abstract

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)