On the Fly Adaptation of Behavior Tree-Based Policies through Reinforcement Learning
Marco Iannotta, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov
TL;DR
The paper tackles adapting Behavior Tree–based policies to local task variations in dynamic manufacturing. It introduces a hierarchical, context-conditioned reinforcement learning framework where an upper-level policy $\pi^{up}_{\boldsymbol{\omega}}$ selects BT parameters $\hat{\boldsymbol{\theta}}$ for a lower-level BT policy $\pi^{low}_{\boldsymbol{\theta}}$, guided by episodic context $\boldsymbol{c}$. Through online RL (SAC) with a replay buffer, the approach achieves fast convergence and generalization across increasingly many task variations, demonstrated both in simulation (Obstacle Avoidance) and on a real Franka Panda (Pivoting). The results show that sharing experience across task variants enables scalable training and improved performance compared with baselines, while preserving the interpretability and safety benefits of BTs. Limitations include fixed (non-learnable) Condition Nodes, with future work aiming to incorporate learnable conditions for even greater adaptability.
Abstract
With the rising demand for flexible manufacturing, robots are increasingly expected to operate in dynamic environments where local -- such as slight offsets or size differences in workpieces -- are common. We propose to address the problem of adapting robot behaviors to these task variations with a sample-efficient hierarchical reinforcement learning approach adapting Behavior Tree (BT)-based policies. We maintain the core BT properties as an interpretable, modular framework for structuring reactive behaviors, but extend their use beyond static tasks by inherently accommodating local task variations. To show the efficiency and effectiveness of our approach, we conduct experiments both in simulation and on a Franka Emika Panda 7-DoF, with the manipulator adapting to different obstacle avoidance and pivoting tasks.
