Robust Policy Search for Robot Navigation
Javier Garcia-Barcos, Ruben Martinez-Cantin
TL;DR
The paper addresses robust and data-efficient policy search for robot navigation by marrying unscented Bayesian optimization with Boltzmann acquisition. It introduces an adaptive GP surrogate using Spartan nonstationary kernels and MCMC hyperparameter inference to model complex reward landscapes, along with an unscented transformation-based mechanism to propagate input perturbations. The authors present two robustness pillars: a robust optimization component that considers perturbations via an integrated objective and a statistical robustness component that relies on a stochastic acquisition policy with convergence guarantees. Empirical results on benchmark functions and robotic tasks demonstrate improved stability, exploration, and the feasibility of distributed, multi-robot optimization without central coordination.
Abstract
Complex robot navigation and control problems can be framed as policy search problems. However, interactive learning in uncertain environments can be expensive, requiring the use of data-efficient methods. Bayesian optimization is an efficient nonlinear optimization method where queries are carefully selected to gather information about the optimum location. This is achieved by a surrogate model, which encodes past information, and the acquisition function for query selection. Bayesian optimization can be very sensitive to uncertainty in the input data or prior assumptions. In this work, we incorporate both robust optimization and statistical robustness, showing that both types of robustness are synergistic. For robust optimization we use an improved version of unscented Bayesian optimization which provides safe and repeatable policies in the presence of policy uncertainty. We also provide new theoretical insights. For statistical robustness, we use an adaptive surrogate model and we introduce the Boltzmann selection as a stochastic acquisition method to have convergence guarantees and improved performance even with surrogate modeling errors. We present results in several optimization benchmarks and robot tasks.
