Table of Contents
Fetching ...

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

Kevin S. Miller, Adam J. Thorpe, Ufuk Topcu

TL;DR

An active learning algorithm is presented that leverages side information by explicitly incorporating prior domain knowledge into the sampling process and yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance.

Abstract

We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process. Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information. Through numerical experiments, we demonstrate that this strategy explores regions of high discrepancy and accelerates learning while simultaneously reducing model uncertainty. We rigorously prove that our active learning algorithm yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance. We demonstrate the efficacy of our approach on an under-actuated pendulum system and on the half-cheetah MuJoCo environment.

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

TL;DR

An active learning algorithm is presented that leverages side information by explicitly incorporating prior domain knowledge into the sampling process and yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance.

Abstract

We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process. Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information. Through numerical experiments, we demonstrate that this strategy explores regions of high discrepancy and accelerates learning while simultaneously reducing model uncertainty. We rigorously prove that our active learning algorithm yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance. We demonstrate the efficacy of our approach on an under-actuated pendulum system and on the half-cheetah MuJoCo environment.
Paper Structure (11 sections, 3 theorems, 27 equations, 5 figures)

This paper contains 11 sections, 3 theorems, 27 equations, 5 figures.

Key Result

Lemma 1

Let $f \in \mathcal{H}^d_{k,B}$ and suppose that $\mu_j$ and $\sigma_j$ are the posterior mean and variance of a GP with kernel $k$. There exists $\beta_j(\delta)$, for which the tuple $(\mu_j, \sigma_j, \beta_j(\delta))$ is an all-time-calibrated statistical model (Definition def: all-time calibrat

Figures (5)

  • Figure 1: Average reduction of maximum variance (uncertainty) over $8000$ test points $\mathcal{T}$ spaced evenly over the entire state space. The shaded region shows the maximum and minimum values over $10$ independent trials.
  • Figure 2: Mean squared error (MSE) of the learned models over test points $\mathcal{T}$. The inclusion of side information leads to reduced error. The shaded region shows the maximum and minimum values over $10$ independent trials.
  • Figure 3: Average discrepancy between the prior model and the true, underlying dynamics at the points visited by the algorithms during each episode.
  • Figure 4: The true half-cheetah system (left) and the imperfect half-cheetah system used as the bias for our algorithm (right).
  • Figure 5: Cumulative reward of our approach at a downstream control task compared to the oracle using the true dynamics.

Theorems & Definitions (6)

  • Definition 1: All-time calibrated statistical model of $f$, rothfuss2023hallucinated
  • Lemma 1: Well-calibrated confidence intervals for RKHS, rothfuss2023hallucinated
  • Lemma 2
  • proof
  • Theorem 1
  • proof