Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot
Xinda Qi, Dong Chen, Zhaojian Li, Xiaobo Tan
TL;DR
This work tackles sample-inefficient learning in off-policy reinforcement learning for soft robots by introducing Back-stepping Experience Replay (BER), a bidirectional exploration framework that uses approximate reversibility to generate reversed trajectories via back-stepping transitions. BER maintains separate replay buffers for forward and back-stepping experiences and uses a probabilistic sampling scheme with $P_{t,f}$ and $P_{t,b}$, with reversibility bounded by $\lVert \bm{s}_{b,t}-\bm{s}_t \rVert \le K \lVert \bm{s}_{t+1}-\bm{s}_t \rVert$ and $K<1$, decaying the backward component when needed. The algorithm is validated on a toy binary bit-flipping task and then applied to a model-free RL problem controlling a soft snake robot performing serpentine locomotion, using a compact four-path actuation model and a target-conditioned state space. Experiments show BER improves learning speed and stability, achieving $100\%$ success on random targets and a $48\%$ increase in average speed compared to the best baseline, underscoring BER's potential to enhance data efficiency in off-policy RL for soft robotics. The results suggest BER's bidirectional search and distillation of transitional information can generalize to other reversible or approximately reversible dynamical systems and tasks beyond soft robotics.
Abstract
In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a distillation of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.
