Table of Contents
Fetching ...

Learning Constraints from Stochastic Partially-Observed Closed-Loop Demonstrations

Chih-Yuan Chiu, Zhouyu Zhang, Glen Chou

TL;DR

This work addresses learning unknown, parametric safety constraints from locally optimal input–output demonstrations produced by stochastic, partially observed closed-loop systems. It extends inverse optimal control with a robust, KKT-based feasibility formulation and employs system-level synthesis to represent output-feedback policies, enabling recovery of nominal trajectories, feedback laws, and the constraint parameter $ heta$ from noisy data. The authors prove recovery and conservativeness guarantees in zero-noise settings and derive linear-in-noise sensitivity bounds, with empirical validation on linear and nonlinear dynamics (e.g., unicycle and quadrotor) showing high accuracy and robustness to transmission errors. The method yields provable constraint learning and safe policy synthesis, improving the reliability of constraint inference for safety-critical robotic applications under real-world noise and partial observability.

Abstract

We present a method for learning unknown parametric constraints from locally-optimal input-output trajectory data. We assume the data is generated by rollouts of stochastic nonlinear dynamics, under a single state or output feedback law and initial condition but distinct noise realizations, to robustly satisfy underlying constraints despite worst-case noise outcomes. We encode the Karush-Kuhn-Tucker (KKT) conditions of this robust optimal feedback control problem within a feasibility problem to recover constraints consistent with the local optimality of the demonstrations. We prove that our constraint learning method (i) accurately recovers the demonstrator's policy, and (ii) conservatively estimates the set of policies that ensure constraint satisfaction despite worst-case noise realizations. Moreover, we perform sensitivity analysis, proving that when demonstrations are corrupted by transmission error, the inaccuracy in the learned feedback law scales linearly in the error magnitude. Empirically, our method accurately recovers unknown constraints from simulated noisy, closed-loop demonstrations generated using dynamics, both linear and nonlinear, (e.g., unicycle and quadrotor) and a range of feedback mechanisms.

Learning Constraints from Stochastic Partially-Observed Closed-Loop Demonstrations

TL;DR

This work addresses learning unknown, parametric safety constraints from locally optimal input–output demonstrations produced by stochastic, partially observed closed-loop systems. It extends inverse optimal control with a robust, KKT-based feasibility formulation and employs system-level synthesis to represent output-feedback policies, enabling recovery of nominal trajectories, feedback laws, and the constraint parameter from noisy data. The authors prove recovery and conservativeness guarantees in zero-noise settings and derive linear-in-noise sensitivity bounds, with empirical validation on linear and nonlinear dynamics (e.g., unicycle and quadrotor) showing high accuracy and robustness to transmission errors. The method yields provable constraint learning and safe policy synthesis, improving the reliability of constraint inference for safety-critical robotic applications under real-world noise and partial observability.

Abstract

We present a method for learning unknown parametric constraints from locally-optimal input-output trajectory data. We assume the data is generated by rollouts of stochastic nonlinear dynamics, under a single state or output feedback law and initial condition but distinct noise realizations, to robustly satisfy underlying constraints despite worst-case noise outcomes. We encode the Karush-Kuhn-Tucker (KKT) conditions of this robust optimal feedback control problem within a feasibility problem to recover constraints consistent with the local optimality of the demonstrations. We prove that our constraint learning method (i) accurately recovers the demonstrator's policy, and (ii) conservatively estimates the set of policies that ensure constraint satisfaction despite worst-case noise realizations. Moreover, we perform sensitivity analysis, proving that when demonstrations are corrupted by transmission error, the inaccuracy in the learned feedback law scales linearly in the error magnitude. Empirically, our method accurately recovers unknown constraints from simulated noisy, closed-loop demonstrations generated using dynamics, both linear and nonlinear, (e.g., unicycle and quadrotor) and a range of feedback mechanisms.

Paper Structure

This paper contains 14 sections, 5 theorems, 31 equations, 4 figures.

Key Result

Proposition 1

ZhouTzoumas2023SafeControlofPartiallyObservedLTVSystems If there exists a lower block triangular matrix $\mathcal{K}$ satisfying Eqn: Error Dynamics and Output Maps, stacked, then $\Phi_{xw}$, $\Phi_{xe}$, $\Phi_{uw}$, and $\Phi_{ue}$, as computed by Eqn: Phi from K satisfy the affine equalities: Conversely, if $\Phi_{xw}$, $\Phi_{xe}$, $\Phi_{uw}$, and $\Phi_{ue}$ satisfy Eqn: Phi, Affine Constr

Figures (4)

  • Figure 1: Error magnitude in recovered (a) nominal trajectories and controls, (b) output feedback, and (c) constraint parameter as a function of the transmission error level. Demonstrations were generated using noise-corrupted double integrator dynamics and (full) state feedback SLS controllers. On average, our method learns a priori unknown parameters with high accuracy under low levels of transmission noise.
  • Figure 2: Constraint learning from demonstrations generated via SLS with double integrator (a, c) and linear 12D quadcopter (d, e, f) dynamics. (e, f) provide views of (d) from different angles. (a, c-f) Our method accurately learns true collision avoidance constraints across all simulations, but (b) the baseline Chou2020LearningConstraintsFromLocallyOptimalDemonstrationsUnderCostFunctionUncertainty did not; note the mismatch between the learned (black) and true (yellow) constraints.
  • Figure 3: Constraint learning from demonstrations produced via CCM with unicycle (a) and double integrator (b) dynamics. Our method accurately learns true collision avoidance constraints across all simulations.
  • Figure 4: Constraint learning from demonstrations produced via PD with (a) unicycle, (b) double integrator, and (c) nonlinear 6D quadcopter dynamics. Our method accurately learns true collision avoidance constraints for all simulations.

Theorems & Definitions (15)

  • Proposition 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • ...and 5 more