Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models
Alex LaGrassa, Moonyoung Lee, Oliver Kroemer
TL;DR
This work tackles planning with inaccurate dynamics by learning model preconditions that constrain planning to reliable regions, formalized through a model deviation estimator (MDE). The MDE is implemented as a Gaussian Process with a Matérn kernel and heteroscedastic noise, predicting deviation $d(s,a)$ and enabling a precondition defined by $P(d(s,a) > d_{ ext{max}}) < $ (equivalently $mu(s,a) + sigma(s,a) < d_{ ext{max}}$). The authors introduce a task-oriented, active-learning loop that generates candidate trajectories via an RRT-based planner, evaluates an acquisition function across trajectory steps using a lower-confidence bound, and updates the MDE after batches of trajectories; a scheduled risk parameter $eta$ adjusts the precondition conservatism over time. Empirically, the approach improves data efficiency and planning reliability across icy gridworld, simulated plant watering, and real plant watering tasks, achieving roughly 80% improvement after only four real-world trajectories. This framework enables data-efficient estimation of model preconditions, with potential applicability to high-dimensional and deformable-object domains where model inaccuracies are pronounced.
Abstract
When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques.
