Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach
Sebastian Weyrer, Peter Manzl, A. L. Schwab, Johannes Gerstmayr
TL;DR
This work demonstrates that reinforcement learning can control a Whipple bicycle model to perform path following while stabilizing lateral dynamics without stabilizing aids, by outputting steering commands that are mapped to torques via a PD controller. A Soft Actor-Critic DRL framework operates in a virtual multibody environment (Exudyn/OpenAI Gym) with a 0.05 s controller cadence, learning across speeds $v\in[2,7]$ m/s and diverse path geometries. A four-part curriculum learning strategy enables progressive learning of speed-dependent policies, while SHAP explanations reveal that roll angle $\varphi$ and preview features strongly drive steering decisions, linking RL behavior to established bicycle dynamics. The results show robust performance on random and benchmark paths, achieving mean lateral deviations on the order of the wheelbase and demonstrating the feasibility of sim-to-real transfer considerations for future real-world riding systems.
Abstract
Over the years, complex control approaches have been developed to control the motion of a bicycle. Reinforcement Learning (RL), a branch of machine learning, promises easy deployment of so-called agents. Deployed agents are increasingly considered as an alternative to controllers for mechanical systems. The present work introduces an RL approach to do path following with a virtual bicycle model while simultaneously stabilising it laterally. The bicycle, modelled as the Whipple benchmark model and using multibody system dynamics, has no stabilisation aids. The agent succeeds in both path following and stabilisation of the bicycle model exclusively by outputting steering angles, which are converted into steering torques via a PD controller. Curriculum learning is applied as a state-of-the-art training strategy. Different settings for the implemented RL framework are investigated and compared to each other. The performance of the deployed agents is evaluated using different types of paths and measurements. The ability of the deployed agents to do path following and stabilisation of the bicycle model travelling between 2m/s and 7m/s along complex paths including full circles, slalom manoeuvres, and lane changes is demonstrated. Explanatory methods for machine learning are used to analyse the functionality of a deployed agent and link the introduced RL approach with research in the field of bicycle dynamics.
