Table of Contents
Fetching ...

Bayesian Optimization with Adaptive Kernels for Robot Control

Ruben Martinez-Cantin

TL;DR

The paper tackles nonstationarity in Bayesian optimization for robot control by introducing Spartan Bayesian Optimization (SBO), an adaptive local-global kernel that combines a global kernel with a local kernel whose influence region shifts with data. SBO learns hyperparameters via MCMC, enabling fast local exploitation near optima while preserving global exploration, and demonstrates improved sample efficiency across optimization benchmarks, RL tasks, and a UAV wing-design example. The results show SBO outperforms standard BO and warping-based nonstationary methods, particularly on nonstationary problems, while also offering gains on stationary problems through refined local modeling. The work suggests SBO as a broadly applicable approach for efficient policy search and design optimization in robotics and related domains.

Abstract

Active policy search combines the trial-and-error methodology from policy search with Bayesian optimization to actively find the optimal policy. First, policy search is a type of reinforcement learning which has become very popular for robot control, for its ability to deal with complex continuous state and action spaces. Second, Bayesian optimization is a sample efficient global optimization method that uses a surrogate model, like a Gaussian process, and optimal decision making to carefully select each sample during the optimization process. Sample efficiency is of paramount importance when each trial involves the real robot, expensive Monte Carlo runs, or a complex simulator. Black-box Bayesian optimization generally assumes a cost function from a stationary process, because nonstationary modeling is usually based on prior knowledge. However, many control problems are inherently nonstationary due to their failure conditions, terminal states and other abrupt effects. In this paper, we present a kernel function specially designed for Bayesian optimization, that allows nonstationary modeling without prior knowledge, using an adaptive local region. The new kernel results in an improved local search (exploitation), without penalizing the global search (exploration), as shown experimentally in well-known optimization benchmarks and robot control scenarios. We finally show its potential for the design of the wing shape of a UAV.

Bayesian Optimization with Adaptive Kernels for Robot Control

TL;DR

The paper tackles nonstationarity in Bayesian optimization for robot control by introducing Spartan Bayesian Optimization (SBO), an adaptive local-global kernel that combines a global kernel with a local kernel whose influence region shifts with data. SBO learns hyperparameters via MCMC, enabling fast local exploitation near optima while preserving global exploration, and demonstrates improved sample efficiency across optimization benchmarks, RL tasks, and a UAV wing-design example. The results show SBO outperforms standard BO and warping-based nonstationary methods, particularly on nonstationary problems, while also offering gains on stationary problems through refined local modeling. The work suggests SBO as a broadly applicable approach for efficient policy search and design optimization in robotics and related domains.

Abstract

Active policy search combines the trial-and-error methodology from policy search with Bayesian optimization to actively find the optimal policy. First, policy search is a type of reinforcement learning which has become very popular for robot control, for its ability to deal with complex continuous state and action spaces. Second, Bayesian optimization is a sample efficient global optimization method that uses a surrogate model, like a Gaussian process, and optimal decision making to carefully select each sample during the optimization process. Sample efficiency is of paramount importance when each trial involves the real robot, expensive Monte Carlo runs, or a complex simulator. Black-box Bayesian optimization generally assumes a cost function from a stationary process, because nonstationary modeling is usually based on prior knowledge. However, many control problems are inherently nonstationary due to their failure conditions, terminal states and other abrupt effects. In this paper, we present a kernel function specially designed for Bayesian optimization, that allows nonstationary modeling without prior knowledge, using an adaptive local region. The new kernel results in an improved local search (exploitation), without penalizing the global search (exploration), as shown experimentally in well-known optimization benchmarks and robot control scenarios. We finally show its potential for the design of the wing shape of a UAV.
Paper Structure (18 sections, 1 theorem, 7 equations, 5 figures, 1 table)

This paper contains 18 sections, 1 theorem, 7 equations, 5 figures, 1 table.

Key Result

Proposition 1

ZiyuWang2016short Given two kernels $k_l$ and $k_s$ with large and small length scale hyperparameters respectively, any function $f$ in the RKHS characterized by a kernel $k_l$ is also an element of the RKHS characterized by $k_s$.

Figures (5)

  • Figure 1: Representation of the Spartan kernel in SBO. Typically, the local and global kernels have a small and large length-scale respectively. The influence of each kernel is represented by the normalized weight at the bottom of the plot. Note how the kernel with small length-scale produces larger uncertainties which is an advantage for fast exploitation, but it can perform poorly for global exploration as it tends to sample equally almost everywhere. On the other hand, the kernel with large length-scale provides a better global estimate, but it can be too constrained locally.
  • Figure 2: Gramacy function Assael2014. The path bellow the surface represents the location of the local kernel as being sampled by MCMC for each BO iteration. Clearly, it moves towards the nonstationary section of the function. For visualization, the path is colored depending on the iteration (start $\rightarrow$ blue $\rightarrow$ black $\rightarrow$ green $\rightarrow$ red $\rightarrow$ end).
  • Figure 3: a) Gramacy function. b) Michalewicz 10D function with m=10. c) Branin-Hoo function, d) Hartmann 6D function. For the nonstationary functions, a) and b), the proposed SBO method results in an outstanding convergence speed compared to the state of the art. For the Gramacy function, SBO finds the minimum in about 30 function evaluations in all tests. For the stationaty functions, c) and d) BO and SBO are barely identical, with SBO producing more accurate results and with smaller uncertainty. The WARP method sometimes improves over standard BO (a,b and d) or produces worse results (c).
  • Figure 4: Total reward for: a) the three limb walker, b) the mountain car and c) the hovering helicopter control problem. For the first problem, SBO is able to achieve higher reward, while other methods get stuck in a local maxima. For the mountain car, SBO is able to achieve maximum performance in all trials after just 27 policy trials (17 iterations + 10 initial samples). For the helicopter problem, BO and WARP have slow convergence, because many policies results in an early crash, providing almost no information. However, SBO is able to exploit good policies and quickly improve the performance.
  • Figure 5: Results for the wing design optimization (10 runs per plot).

Theorems & Definitions (2)

  • Proposition 1
  • Definition 1