Table of Contents
Fetching ...

Deep Gaussian Covariance Network with Trajectory Sampling for Data-Efficient Policy Search

Can Bogoclu, Robert Vosshall, Kevin Cremanns, Dirk Roos

TL;DR

Data-efficient model-based RL requires reliable uncertainty propagation through nonlinear dynamics, i.e. $\Delta_t = s_{t+1} - s_t = f(s_t, a_t)$. The authors introduce DGCNTS, combining a Deep Gaussian Covariance Network with trajectory sampling to perform policy search and compare it against GP- and PNN-based uncertainty propagation across four control tasks. Across the experiments, the S + DGCN method achieves superior data efficiency and robustness to noisy initial states, often outperforming density-based propagation schemes that rely on moment matching or particle filters, and rivals ensembles of probabilistic neural networks. The results support using non-stationary kernel models with trajectory sampling for robust, data-efficient policy search, with future work extending to MPC and higher-dimensional state spaces.

Abstract

Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.

Deep Gaussian Covariance Network with Trajectory Sampling for Data-Efficient Policy Search

TL;DR

Data-efficient model-based RL requires reliable uncertainty propagation through nonlinear dynamics, i.e. . The authors introduce DGCNTS, combining a Deep Gaussian Covariance Network with trajectory sampling to perform policy search and compare it against GP- and PNN-based uncertainty propagation across four control tasks. Across the experiments, the S + DGCN method achieves superior data efficiency and robustness to noisy initial states, often outperforming density-based propagation schemes that rely on moment matching or particle filters, and rivals ensembles of probabilistic neural networks. The results support using non-stationary kernel models with trajectory sampling for robust, data-efficient policy search, with future work extending to MPC and higher-dimensional state spaces.

Abstract

Probabilistic world models increase data efficiency of model-based reinforcement learning (MBRL) by guiding the policy with their epistemic uncertainty to improve exploration and acquire new samples. Moreover, the uncertainty-aware learning procedures in probabilistic approaches lead to robust policies that are less sensitive to noisy observations compared to uncertainty unaware solutions. We propose to combine trajectory sampling and deep Gaussian covariance network (DGCN) for a data-efficient solution to MBRL problems in an optimal control setting. We compare trajectory sampling with density-based approximation for uncertainty propagation using three different probabilistic world models; Gaussian processes, Bayesian neural networks, and DGCNs. We provide empirical evidence using four different well-known test environments, that our method improves the sample-efficiency over other combinations of uncertainty propagation methods and probabilistic models. During our tests, we place particular emphasis on the robustness of the learned policies with respect to noisy initial states.
Paper Structure (15 sections, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 11 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Schematic overview of DGCN. $\boldsymbol{\Theta}_l$ represents the matrix of length scales for all training points $\mathbf{X}$, as output by the NN.
  • Figure 2: Results of benchmarked tasks
  • Figure 3: Epistemic uncertainty of the tested models at IPSU task after 12 iterations
  • Figure 4: Asymmetric unimodal and multimodal distributions observed during IPSU task
  • Figure 5: Average computation time