Learning Constraints from Stochastic Partially-Observed Closed-Loop Demonstrations
Chih-Yuan Chiu, Zhouyu Zhang, Glen Chou
TL;DR
This work addresses learning unknown, parametric safety constraints from locally optimal input–output demonstrations produced by stochastic, partially observed closed-loop systems. It extends inverse optimal control with a robust, KKT-based feasibility formulation and employs system-level synthesis to represent output-feedback policies, enabling recovery of nominal trajectories, feedback laws, and the constraint parameter $ heta$ from noisy data. The authors prove recovery and conservativeness guarantees in zero-noise settings and derive linear-in-noise sensitivity bounds, with empirical validation on linear and nonlinear dynamics (e.g., unicycle and quadrotor) showing high accuracy and robustness to transmission errors. The method yields provable constraint learning and safe policy synthesis, improving the reliability of constraint inference for safety-critical robotic applications under real-world noise and partial observability.
Abstract
We present a method for learning unknown parametric constraints from locally-optimal input-output trajectory data. We assume the data is generated by rollouts of stochastic nonlinear dynamics, under a single state or output feedback law and initial condition but distinct noise realizations, to robustly satisfy underlying constraints despite worst-case noise outcomes. We encode the Karush-Kuhn-Tucker (KKT) conditions of this robust optimal feedback control problem within a feasibility problem to recover constraints consistent with the local optimality of the demonstrations. We prove that our constraint learning method (i) accurately recovers the demonstrator's policy, and (ii) conservatively estimates the set of policies that ensure constraint satisfaction despite worst-case noise realizations. Moreover, we perform sensitivity analysis, proving that when demonstrations are corrupted by transmission error, the inaccuracy in the learned feedback law scales linearly in the error magnitude. Empirically, our method accurately recovers unknown constraints from simulated noisy, closed-loop demonstrations generated using dynamics, both linear and nonlinear, (e.g., unicycle and quadrotor) and a range of feedback mechanisms.
