Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

Kevin S. Miller; Adam J. Thorpe; Ufuk Topcu

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

Kevin S. Miller, Adam J. Thorpe, Ufuk Topcu

TL;DR

An active learning algorithm is presented that leverages side information by explicitly incorporating prior domain knowledge into the sampling process and yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance.

Abstract

We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process. Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information. Through numerical experiments, we demonstrate that this strategy explores regions of high discrepancy and accelerates learning while simultaneously reducing model uncertainty. We rigorously prove that our active learning algorithm yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance. We demonstrate the efficacy of our approach on an under-actuated pendulum system and on the half-cheetah MuJoCo environment.

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

TL;DR

Abstract

Paper Structure (11 sections, 3 theorems, 27 equations, 5 figures)

This paper contains 11 sections, 3 theorems, 27 equations, 5 figures.

Introduction
Preliminaries & Problem Statement
Problem Statement
Gaussian Processes
Active Sampling in the GP Setting
Active Sampling Using Side Information
Consistency Guarantee
Numerical Results
Estimating a Simple Pendulum System
Control Performance Using Learned Dynamics
Conclusion

Key Result

Lemma 1

Let $f \in \mathcal{H}^d_{k,B}$ and suppose that $\mu_j$ and $\sigma_j$ are the posterior mean and variance of a GP with kernel $k$. There exists $\beta_j(\delta)$, for which the tuple $(\mu_j, \sigma_j, \beta_j(\delta))$ is an all-time-calibrated statistical model (Definition def: all-time calibrat

Figures (5)

Figure 1: Average reduction of maximum variance (uncertainty) over $8000$ test points $\mathcal{T}$ spaced evenly over the entire state space. The shaded region shows the maximum and minimum values over $10$ independent trials.
Figure 2: Mean squared error (MSE) of the learned models over test points $\mathcal{T}$. The inclusion of side information leads to reduced error. The shaded region shows the maximum and minimum values over $10$ independent trials.
Figure 3: Average discrepancy between the prior model and the true, underlying dynamics at the points visited by the algorithms during each episode.
Figure 4: The true half-cheetah system (left) and the imperfect half-cheetah system used as the bias for our algorithm (right).
Figure 5: Cumulative reward of our approach at a downstream control task compared to the oracle using the true dynamics.

Theorems & Definitions (6)

Definition 1: All-time calibrated statistical model of $f$, rothfuss2023hallucinated
Lemma 1: Well-calibrated confidence intervals for RKHS, rothfuss2023hallucinated
Lemma 2
proof
Theorem 1
proof

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

TL;DR

Abstract

Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)