Table of Contents
Fetching ...

Robust Policy Search for Robot Navigation

Javier Garcia-Barcos, Ruben Martinez-Cantin

TL;DR

The paper addresses robust and data-efficient policy search for robot navigation by marrying unscented Bayesian optimization with Boltzmann acquisition. It introduces an adaptive GP surrogate using Spartan nonstationary kernels and MCMC hyperparameter inference to model complex reward landscapes, along with an unscented transformation-based mechanism to propagate input perturbations. The authors present two robustness pillars: a robust optimization component that considers perturbations via an integrated objective and a statistical robustness component that relies on a stochastic acquisition policy with convergence guarantees. Empirical results on benchmark functions and robotic tasks demonstrate improved stability, exploration, and the feasibility of distributed, multi-robot optimization without central coordination.

Abstract

Complex robot navigation and control problems can be framed as policy search problems. However, interactive learning in uncertain environments can be expensive, requiring the use of data-efficient methods. Bayesian optimization is an efficient nonlinear optimization method where queries are carefully selected to gather information about the optimum location. This is achieved by a surrogate model, which encodes past information, and the acquisition function for query selection. Bayesian optimization can be very sensitive to uncertainty in the input data or prior assumptions. In this work, we incorporate both robust optimization and statistical robustness, showing that both types of robustness are synergistic. For robust optimization we use an improved version of unscented Bayesian optimization which provides safe and repeatable policies in the presence of policy uncertainty. We also provide new theoretical insights. For statistical robustness, we use an adaptive surrogate model and we introduce the Boltzmann selection as a stochastic acquisition method to have convergence guarantees and improved performance even with surrogate modeling errors. We present results in several optimization benchmarks and robot tasks.

Robust Policy Search for Robot Navigation

TL;DR

The paper addresses robust and data-efficient policy search for robot navigation by marrying unscented Bayesian optimization with Boltzmann acquisition. It introduces an adaptive GP surrogate using Spartan nonstationary kernels and MCMC hyperparameter inference to model complex reward landscapes, along with an unscented transformation-based mechanism to propagate input perturbations. The authors present two robustness pillars: a robust optimization component that considers perturbations via an integrated objective and a statistical robustness component that relies on a stochastic acquisition policy with convergence guarantees. Empirical results on benchmark functions and robotic tasks demonstrate improved stability, exploration, and the feasibility of distributed, multi-robot optimization without central coordination.

Abstract

Complex robot navigation and control problems can be framed as policy search problems. However, interactive learning in uncertain environments can be expensive, requiring the use of data-efficient methods. Bayesian optimization is an efficient nonlinear optimization method where queries are carefully selected to gather information about the optimum location. This is achieved by a surrogate model, which encodes past information, and the acquisition function for query selection. Bayesian optimization can be very sensitive to uncertainty in the input data or prior assumptions. In this work, we incorporate both robust optimization and statistical robustness, showing that both types of robustness are synergistic. For robust optimization we use an improved version of unscented Bayesian optimization which provides safe and repeatable policies in the presence of policy uncertainty. We also provide new theoretical insights. For statistical robustness, we use an adaptive surrogate model and we introduce the Boltzmann selection as a stochastic acquisition method to have convergence guarantees and improved performance even with surrogate modeling errors. We present results in several optimization benchmarks and robot tasks.

Paper Structure

This paper contains 21 sections, 13 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Path planning on uneven terrain with obstacles, with different trajectories displayed (left and right). The orange regions represents represent slopes with a higher traversing cost. The red rectangles are obstacles. Top: the desired trajectories (blue dashed line). Bottom: possible deviations (blue lines) from desired trajectories due to input noise. The right trajectory is more efficient without input noise. Once we take into account input noise, it becomes unsafe as it can collide with obstacles easily. The left trajectory is safer in the presence of input noise despite being less efficient.
  • Figure 2: Diagram showing the different components of our approach, based on policy search with Bayesian optimization (BO) . It depicts a standard BO the Bayesian optimization loop (top) applied to a policy search problem (bottom). The goal is to identify the most efficient policy by sequentially querying different policies and obtaining the corresponding reward. However, as the problem has policy uncertainty, a perturbed policy will be evaluated instead. We highlight (bold text) the elements that differ from a standard policy search with BOBayesian optimization : the Spartan kernel to model nonstationarity, the unscented transform applied in unscented optimal incumbent and unscented acquisition) to propagate the policy uncertainty and, instead of greedy acquisition function maximization, Boltzmann selection sampling to improve exploration in the presence of surrogate modeling errors.
  • Figure 3: Benchmark functions optimization results. In general, UBO is able to find a more stable solution than the vanilla BO, resulting in a better average value. However, using Boltzmann selection results in an improved stability. Parallelized runs had a much lower walltime without a penalty in performance.
  • Figure 4: Robot pushing problem and rover path planning optimization results. For the more complex problems, the UBO is not able to find a stable solution, unlike the Boltzmann selection. Only for the 4D robot push, there is a penalty of using the parallel version.
  • Figure 5: Examples of optimized trajectories found by different methods (rows) and trials (columns), showing the possible deviations from the trajectories by simulating input noise $\sigma=0.02$. We display the cost of the desired trajectory (assuming no input noise) and the average cost from possible deviations over each result. Each Note that BO (first row represents a different algorithm) does not find the safe path in any of the trials . From top to bottom: BO, The best trial is found by UBO (2nd row and column: noiseless cost 0.27 , UBO-SPnoisy cost 0.40) , UBO-SPx4however most of the time lacks exploration to find a good one. The Boltzmann selection (serial in row 3 and parallel in row 4) improves exploration allowing better results overall .