Hierarchical Policy Blending as Inference for Reactive Robot Control

Kay Hansel; Julen Urain; Jan Peters; Georgia Chalvatzaki

Hierarchical Policy Blending as Inference for Reactive Robot Control

Kay Hansel, Julen Urain, Jan Peters, Georgia Chalvatzaki

TL;DR

The work tackles safe, responsive robot control in cluttered, dynamic environments by formulating motion generation as a Product of Experts (PoE) over energy-based reactive policies and coupling it with online planning-as-inference to adapt policy weights. Low-level actions are derived from a weighted Gaussian blend $\pi(a|s,\boldsymbol{\beta}) \propto \prod_i \pi_i(a|s)^{\beta_i}$, while a high-level planner updates $\boldsymbol{\beta}$ using a Dirichlet variational posterior via an ELBO objective and an iCEM shooting method to look ahead. The approach integrates maximum-entropy constraints in parameter and action spaces and leverages variational inference to avoid local optima. Empirical results on 2D navigation and 7DoF manipulation show that the method outperforms purely reactive or online-replanning baselines, delivering higher success and safety rates in dynamic, cluttered scenarios with feasible computational trade-offs.

Abstract

Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.

Hierarchical Policy Blending as Inference for Reactive Robot Control

TL;DR

, while a high-level planner updates

using a Dirichlet variational posterior via an ELBO objective and an iCEM shooting method to look ahead. The approach integrates maximum-entropy constraints in parameter and action spaces and leverages variational inference to avoid local optima. Empirical results on 2D navigation and 7DoF manipulation show that the method outperforms purely reactive or online-replanning baselines, delivering higher success and safety rates in dynamic, cluttered scenarios with feasible computational trade-offs.

Abstract

Paper Structure (6 sections, 11 equations, 5 figures, 1 table)

This paper contains 6 sections, 11 equations, 5 figures, 1 table.

Introduction
RELATED WORK
PRELIMINARIES
Hierarchical Reactive Policy Blending
EXPERIMENTS
CONCLUSIONS

Figures (5)

Figure 1: A sequence of the reactive motion of a 7DoF manipulator robot. The robot starts moving from the orange box toward the green box. Our proposed method enables a reactive motion that avoids collisions with the grey obstacle and overcomes local minima resulting from multiple constraints.
Figure 2: 2D Toy environments for planar point-mass navigation. The orange dot denotes the start and the green one the goal location. Top. The toy maze environment with dynamic obstacles. Bottom. the toy box environment, in which a box moves horizontally at a constant speed. The goal is fixed in the center of the moving box.
Figure 3: The results of an ablation study in the 2D toy box environment. The speed increases from a minimum of zero (static) to 30 pixels per step (dynamic). We compare the baselines, i.e., rmpflow and mpcicem, with our method hipbi and employ different look-ahead horizons (LA). Left. The success rate shows the performance. Right. The safety rate indicates how often no collision occurs.
Figure 4: Manipulation environment in which the intermediate (orange) and target (green) boxes are randomly selected out of four boxes. Five randomly generated grey obstacles obstruct the path of the 7DoF manipulator robot. With blue, we denote the executed trajectory. Top. Performance of method that gets stuck in a local optimum. Bottom. Performance of our proposed , that successfully discovers an obstacle-free path to the target.
Figure 5: Evaluation study on the manipulation environment. We benchmark our approach in a static and a dynamic setting against the baseline . Left. The success rate in a static environment. The number of obstacles varies from zero up to a maximum of five. Right. The success rate in a dynamic environment. A maximum of five movable obstacles are used.

Hierarchical Policy Blending as Inference for Reactive Robot Control

TL;DR

Abstract

Hierarchical Policy Blending as Inference for Reactive Robot Control

Authors

TL;DR

Abstract

Table of Contents

Figures (5)