Table of Contents
Fetching ...

Efficient Stochastic Optimal Control through Approximate Bayesian Input Inference

Joe Watson, Hany Abdulsamad, Rolf Findeisen, Jan Peters

TL;DR

Analyzing the Gaussian setting, this work presents an inference-based solver that is effective in stochastic and deterministic settings and was found to be superior to popular baselines on nonlinear simulated tasks.

Abstract

Optimal control under uncertainty is a prevailing challenge for many reasons. One of the critical difficulties lies in producing tractable solutions for the underlying stochastic optimization problem. We show how advanced approximate inference techniques can be used to handle the statistical approximations principled and practically by framing the control problem as a problem of input estimation. Analyzing the Gaussian setting, we present an inference-based solver that is effective in stochastic and deterministic settings and was found to be superior to popular baselines on nonlinear simulated tasks. We draw connections that relate this inference formulation to previous approaches for stochastic optimal control and outline several advantages that this inference view brings due to its statistical nature.

Efficient Stochastic Optimal Control through Approximate Bayesian Input Inference

TL;DR

Analyzing the Gaussian setting, this work presents an inference-based solver that is effective in stochastic and deterministic settings and was found to be superior to popular baselines on nonlinear simulated tasks.

Abstract

Optimal control under uncertainty is a prevailing challenge for many reasons. One of the critical difficulties lies in producing tractable solutions for the underlying stochastic optimization problem. We show how advanced approximate inference techniques can be used to handle the statistical approximations principled and practically by framing the control problem as a problem of input estimation. Analyzing the Gaussian setting, we present an inference-based solver that is effective in stochastic and deterministic settings and was found to be superior to popular baselines on nonlinear simulated tasks. We draw connections that relate this inference formulation to previous approaches for stochastic optimal control and outline several advantages that this inference view brings due to its statistical nature.

Paper Structure

This paper contains 38 sections, 2 theorems, 34 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Lemma V.1

(Maximum entropy distributions, Section 12.1 Cover2006) Let function ${\bm{h}}({\bm{x}}){\,:\,}\mathbb{R}^{d_x}{\,\rightarrow\,}\mathbb{R}^h$ contain all 'useful' information about random variable ${\bm{x}}$. Given an observed empirical average $\hat{{\bm{h}}}$, and wish to find the density $q({\bm{

Figures (12)

  • Figure 1: Probabilistic graphical model of i2c for one timestep, following the notation of Loeliger et al. loeliger2007factor, illustrating how incorporating the quadratic cost structure yields a Gaussian state-space model for inference.
  • Figure 2: An illustration of different approximate inference methods when propagating a bivariate Gaussian through a nonlinear function. The Monte Carlo samples (\ref{['sample']}) indicate the distribution becomes highly non-Gaussian, but evaluating these samples is computationally intensive. Linearizing the function returns an approximation that is highly localized around its (inaccurate) mean prediction. Cubature quadrature (\ref{['cubature_point']}) uses $2d$ points, but improves the estimate, particularly in the mean. 4th-degree Gauss-Hermite (\ref{['gh_point']}) uses $d^4$ points, but almost directly matches the Monte Carlo estimate in mean and covariance.
  • Figure 3: For a 1D non-covex, nonlinear inverse problem, the adaptive regularization provided by the EM update rule can be shown to provide superior stability and convergence compared to fixed hyperparameters for iterative inference.
  • Figure 4: A simple comparison between exploration using UCB and the expected log-likelihood (E-LLH) for a 1D Gaussian process dynamics model. UCB seeks the most uncertain region in the objective, while the inference-strategy seeks $x_1{\,=\,}0$ due to the cost-as-likelihood transformation and filtering step, such that the posterior $x_1$ drifts from the mean prediction. As a result, in this scenario, the inference approach selects an effective $u_0$ under uncertainty across a range of $\alpha$ values, while UCB tends to over-explore as $\beta$ increases. Uncertainty intervals illustrate one and two standard deviations.
  • Figure 5: The conditional Gaussian distribution as a linear control law. The standard linear control applies far outside the expected distribution, where it was not designed for. The 'expert' controller reverts to the prior when far from the mean, which prevents erroneous feedback control.
  • ...and 7 more figures

Theorems & Definitions (5)

  • definition 1
  • definition 2
  • Lemma V.1
  • Proposition V.1
  • proof