Table of Contents
Fetching ...

Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models

Alex LaGrassa, Moonyoung Lee, Oliver Kroemer

TL;DR

This work tackles planning with inaccurate dynamics by learning model preconditions that constrain planning to reliable regions, formalized through a model deviation estimator (MDE). The MDE is implemented as a Gaussian Process with a Matérn kernel and heteroscedastic noise, predicting deviation $d(s,a)$ and enabling a precondition defined by $P(d(s,a) > d_{ ext{max}}) < $ (equivalently $mu(s,a) + sigma(s,a) < d_{ ext{max}}$). The authors introduce a task-oriented, active-learning loop that generates candidate trajectories via an RRT-based planner, evaluates an acquisition function across trajectory steps using a lower-confidence bound, and updates the MDE after batches of trajectories; a scheduled risk parameter $eta$ adjusts the precondition conservatism over time. Empirically, the approach improves data efficiency and planning reliability across icy gridworld, simulated plant watering, and real plant watering tasks, achieving roughly 80% improvement after only four real-world trajectories. This framework enables data-efficient estimation of model preconditions, with potential applicability to high-dimensional and deformable-object domains where model inaccuracies are pronounced.

Abstract

When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques.

Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models

TL;DR

This work tackles planning with inaccurate dynamics by learning model preconditions that constrain planning to reliable regions, formalized through a model deviation estimator (MDE). The MDE is implemented as a Gaussian Process with a Matérn kernel and heteroscedastic noise, predicting deviation and enabling a precondition defined by (equivalently ). The authors introduce a task-oriented, active-learning loop that generates candidate trajectories via an RRT-based planner, evaluates an acquisition function across trajectory steps using a lower-confidence bound, and updates the MDE after batches of trajectories; a scheduled risk parameter adjusts the precondition conservatism over time. Empirically, the approach improves data efficiency and planning reliability across icy gridworld, simulated plant watering, and real plant watering tasks, achieving roughly 80% improvement after only four real-world trajectories. This framework enables data-efficient estimation of model preconditions, with potential applicability to high-dimensional and deformable-object domains where model inaccuracies are pronounced.

Abstract

When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques.
Paper Structure (13 sections, 1 equation, 7 figures)

This paper contains 13 sections, 1 equation, 7 figures.

Figures (7)

  • Figure 1: Illustrative example of using a planner and acquisition function to iteratively select informative trajectories to define where the model is accurate to compute plans to the goal. In this example, the known dynamics model on the upper left $(\hat{f}(s,a,s'))$ reasons only about the containers but not about the plant. The problem is to define where the model is accurate enough to compute plans to the goal. The resulting learned model precondition is then used at test time to only perform actions in the model precondition.
  • Figure 2: Overview of our method: Each iteration $j$ starts with sampling a planning problem and generating candidate trajectories that satisfy domain constraints and reach the goal. We outline the acquisition function computation for each trajectory in the pink box, including the step-wise acquisition function values, $\alpha_{\mathrm{step}}(s_t,a_t)$ for each state-action pair in the trajectory. These values are then aggregated by a function $h$ to yield the trajectory's utility: $\alpha(\tau)$ . The final step is selecting and executing $\tau*$, in the test environment to collect the ground truth $[s_{[1:T_{\tau}]}, a_{[1:T_{\tau}-1]}]$. The MDE is updated every $M$ trajectories.
  • Figure 3: Scenarios and their corresponding dynamics models. (a) Slippery grid world where movement may result in slipping backwards over ice (blue) or not moving (grey). The analytical dynamics model assumes unimpeded movement within grid bounds. (b) Simulated plant watering using a learned dynamics model trained on a scenario without a plant. (c) Real-world plant watering with a rule-based analytical dynamics model.
  • Figure 4: Ratio of trajectory types executed during training (examples shown above) for our method and the Random ablation over training iterations.
  • Figure 5: MDE with plant overlay over training iterations for translation-only actions and rotation-only actions. Color scales indicate ground-truth deviation (right, top) and upper bound of the predicted deviation $\mu(\hat{d}(s,a)) + \beta \hat{d}(\sigma(s,a))$ for $\beta = 2$. $d_{\mathrm{max}}=0.1$, so the blue region indicates the model precondition.
  • ...and 2 more figures