Table of Contents
Fetching ...

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Dimitris Papadimitriou, Daniel S. Brown

TL;DR

The paper tackles the challenge of inferring environmental constraints for safe policy learning by reframing constraint discovery as a Bayesian preference-learning problem. It extends preference modeling with margin-aware, grouped demonstrations to capture varying constraint severities, and introduces PBICRL, a Monte Carlo method that infers constraint weights and indicator variables (and, in extensions, unknown feature parameters) without requiring policy re-optimization at each step. Empirical results across 2D point mass, Fetch-Reach, HalfCheetah, and Ant demonstrate that PBICRL more accurately and efficiently recovers constraints than state-of-the-art baselines, including when constraint features are unknown or parametric. The approach provides uncertainty quantification via the posterior and supports active learning and safer policy synthesis, with future directions toward higher-dimensional tasks and representation learning for constraint features.

Abstract

It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

TL;DR

The paper tackles the challenge of inferring environmental constraints for safe policy learning by reframing constraint discovery as a Bayesian preference-learning problem. It extends preference modeling with margin-aware, grouped demonstrations to capture varying constraint severities, and introduces PBICRL, a Monte Carlo method that infers constraint weights and indicator variables (and, in extensions, unknown feature parameters) without requiring policy re-optimization at each step. Empirical results across 2D point mass, Fetch-Reach, HalfCheetah, and Ant demonstrate that PBICRL more accurately and efficiently recovers constraints than state-of-the-art baselines, including when constraint features are unknown or parametric. The approach provides uncertainty quantification via the posterior and supports active learning and safer policy synthesis, with future directions toward higher-dimensional tasks and representation learning for constraint features.

Abstract

It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
Paper Structure (26 sections, 14 equations, 10 figures, 5 tables, 3 algorithms)

This paper contains 26 sections, 14 equations, 10 figures, 5 tables, 3 algorithms.

Figures (10)

  • Figure 1: (\ref{['fig:fig_example1']}) Example of three different types of trajectories. A preferable (green), a bad (orange) and a slightly worse than orange (red). (\ref{['fig:bar_plot_ex']}) Weight values obtained from BPL and BPL with margins.
  • Figure 2: Illustrative example of grouped rankings of demonstrations with different inter-class margins.
  • Figure 3: 2D point mass navigational environment (\ref{['fig:2d_env_2']}) and Fetch-Reach robot (\ref{['fig:fetch_env_2']}) environments.
  • Figure 4: (\ref{['fig:2d_traj']}) 2D point mass environment demonstrations. (\ref{['fig:2d_results']}) Inference results for point mass environment. Results averaged over $5$ seeds.
  • Figure 5: (\ref{['fig:fetch_traj']}) Fetch-Reach robot demonstrations. (\ref{['fig:fetch_res']}) Inference results for Fetch-Reach environment. Results averaged over $5$ seeds.
  • ...and 5 more figures