Learning Robot Safety from Sparse Human Feedback using Conformal Prediction
Aaron O. Feldman, Joseph A. Vincent, Maximilian Adang, Jun En Low, Mac Schwager
TL;DR
The paper tackles robot safety in settings where formal constraints are hard to specify by learning from sparse human labels. It introduces a conformal-prediction framework around a nearest-neighbor classifier to define a Suspected Unsafe Sublevel region $C(\epsilon)$ that contains at least a fraction $1-\epsilon$ of future unsafe states, without withholding data. The authors develop closed-form calibration, extend to unsafe-safe data, and connect the framework to a practical warning system and a backup safety policy that avoids entering $C(\epsilon)$, demonstrated on quadcopter MPC and visuomotor policies with both simulation and hardware experiments. The approach yields guaranteed miss rates, is interpretable (geometric SUS region), and integrates with representation learning to handle high-dimensional observations, enabling safer policy execution with minimal data and effort.
Abstract
Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.
