Hierarchical Policy Blending as Inference for Reactive Robot Control
Kay Hansel, Julen Urain, Jan Peters, Georgia Chalvatzaki
TL;DR
The work tackles safe, responsive robot control in cluttered, dynamic environments by formulating motion generation as a Product of Experts (PoE) over energy-based reactive policies and coupling it with online planning-as-inference to adapt policy weights. Low-level actions are derived from a weighted Gaussian blend $\pi(a|s,\boldsymbol{\beta}) \propto \prod_i \pi_i(a|s)^{\beta_i}$, while a high-level planner updates $\boldsymbol{\beta}$ using a Dirichlet variational posterior via an ELBO objective and an iCEM shooting method to look ahead. The approach integrates maximum-entropy constraints in parameter and action spaces and leverages variational inference to avoid local optima. Empirical results on 2D navigation and 7DoF manipulation show that the method outperforms purely reactive or online-replanning baselines, delivering higher success and safety rates in dynamic, cluttered scenarios with feasible computational trade-offs.
Abstract
Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
