Heuristic Step Planning for Learning Dynamic Bipedal Locomotion: A Comparative Study of Model-Based and Model-Free Approaches
William Suliman, Ekaterina Chaikovskaia, Egor Davydenko, Roman Gorbachev
TL;DR
The paper tackles robust, energy-efficient bipedal locomotion in unstructured environments by integrating a simple heuristic Linear Step Planning (LS) with a Raibert-like velocity regulator into a reinforcement learning framework. It compares LS against a model-based LIPM baseline, showing LS achieves comparable or superior velocity tracking, better energy efficiency, and greater robustness on uneven terrains and under disturbances. The results suggest that complex analytical models may be unnecessary for effective, generalizable learning-based bipedal control, at least in simulated environments. This approach enhances modularity and computational efficiency, enabling reliable interaction with challenging terrains without heavy reliance on full dynamics modeling.
Abstract
This work presents an extended framework for learning-based bipedal locomotion that incorporates a heuristic step-planning strategy guided by desired torso velocity tracking. The framework enables precise interaction between a humanoid robot and its environment, supporting tasks such as crossing gaps and accurately approaching target objects. Unlike approaches based on full or simplified dynamics, the proposed method avoids complex step planners and analytical models. Step planning is primarily driven by heuristic commands, while a Raibert-type controller modulates the foot placement length based on the error between desired and actual torso velocity. We compare our method with a model-based step-planning approach -- the Linear Inverted Pendulum Model (LIPM) controller. Experimental results demonstrate that our approach attains comparable or superior accuracy in maintaining target velocity (up to 80%), significantly greater robustness on uneven terrain (over 50% improvement), and improved energy efficiency. These results suggest that incorporating complex analytical, model-based components into the training architecture may be unnecessary for achieving stable and robust bipedal walking, even in unstructured environments.
