Table of Contents
Fetching ...

Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach

Sebastian Weyrer, Peter Manzl, A. L. Schwab, Johannes Gerstmayr

TL;DR

This work demonstrates that reinforcement learning can control a Whipple bicycle model to perform path following while stabilizing lateral dynamics without stabilizing aids, by outputting steering commands that are mapped to torques via a PD controller. A Soft Actor-Critic DRL framework operates in a virtual multibody environment (Exudyn/OpenAI Gym) with a 0.05 s controller cadence, learning across speeds $v\in[2,7]$ m/s and diverse path geometries. A four-part curriculum learning strategy enables progressive learning of speed-dependent policies, while SHAP explanations reveal that roll angle $\varphi$ and preview features strongly drive steering decisions, linking RL behavior to established bicycle dynamics. The results show robust performance on random and benchmark paths, achieving mean lateral deviations on the order of the wheelbase and demonstrating the feasibility of sim-to-real transfer considerations for future real-world riding systems.

Abstract

Over the years, complex control approaches have been developed to control the motion of a bicycle. Reinforcement Learning (RL), a branch of machine learning, promises easy deployment of so-called agents. Deployed agents are increasingly considered as an alternative to controllers for mechanical systems. The present work introduces an RL approach to do path following with a virtual bicycle model while simultaneously stabilising it laterally. The bicycle, modelled as the Whipple benchmark model and using multibody system dynamics, has no stabilisation aids. The agent succeeds in both path following and stabilisation of the bicycle model exclusively by outputting steering angles, which are converted into steering torques via a PD controller. Curriculum learning is applied as a state-of-the-art training strategy. Different settings for the implemented RL framework are investigated and compared to each other. The performance of the deployed agents is evaluated using different types of paths and measurements. The ability of the deployed agents to do path following and stabilisation of the bicycle model travelling between 2m/s and 7m/s along complex paths including full circles, slalom manoeuvres, and lane changes is demonstrated. Explanatory methods for machine learning are used to analyse the functionality of a deployed agent and link the introduced RL approach with research in the field of bicycle dynamics.

Path Following and Stabilisation of a Bicycle Model using a Reinforcement Learning Approach

TL;DR

This work demonstrates that reinforcement learning can control a Whipple bicycle model to perform path following while stabilizing lateral dynamics without stabilizing aids, by outputting steering commands that are mapped to torques via a PD controller. A Soft Actor-Critic DRL framework operates in a virtual multibody environment (Exudyn/OpenAI Gym) with a 0.05 s controller cadence, learning across speeds m/s and diverse path geometries. A four-part curriculum learning strategy enables progressive learning of speed-dependent policies, while SHAP explanations reveal that roll angle and preview features strongly drive steering decisions, linking RL behavior to established bicycle dynamics. The results show robust performance on random and benchmark paths, achieving mean lateral deviations on the order of the wheelbase and demonstrating the feasibility of sim-to-real transfer considerations for future real-world riding systems.

Abstract

Over the years, complex control approaches have been developed to control the motion of a bicycle. Reinforcement Learning (RL), a branch of machine learning, promises easy deployment of so-called agents. Deployed agents are increasingly considered as an alternative to controllers for mechanical systems. The present work introduces an RL approach to do path following with a virtual bicycle model while simultaneously stabilising it laterally. The bicycle, modelled as the Whipple benchmark model and using multibody system dynamics, has no stabilisation aids. The agent succeeds in both path following and stabilisation of the bicycle model exclusively by outputting steering angles, which are converted into steering torques via a PD controller. Curriculum learning is applied as a state-of-the-art training strategy. Different settings for the implemented RL framework are investigated and compared to each other. The performance of the deployed agents is evaluated using different types of paths and measurements. The ability of the deployed agents to do path following and stabilisation of the bicycle model travelling between 2m/s and 7m/s along complex paths including full circles, slalom manoeuvres, and lane changes is demonstrated. Explanatory methods for machine learning are used to analyse the functionality of a deployed agent and link the introduced RL approach with research in the field of bicycle dynamics.
Paper Structure (31 sections, 49 equations, 15 figures, 8 tables)

This paper contains 31 sections, 49 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Whipple bicycle shown in the reference configuration where it stands upright without the handlebar being turned. The position of the Center of Mass (COM) of the rear wheel, the rear body, the handlebar, and the front wheel are marked, as well as the two ground contact points $\mathrm{P}$ and $\mathrm{Q}$, the two wheel radii $\mathrm{r_R}$ and $\mathrm{r_F}$, and the trail $c$ of the bicycle. The global frame, denoted as $\mathrm{0}$-frame, is shown.
  • Figure 2: Whipple bicycle drawn with its minimal coordinates on positional base and the pitch angle $\theta_\mathrm{B}$. The upright cylinder marked with $\Psi$ represents the yaw angle, the mounting attached to the rear dropout is used to illustrate the roll angle $\varphi$ that is independent of the pitch angle $\theta_\mathrm{B}$ of the rear body. Note that the pitch angle $\theta_\mathrm{B}$ is not a minimal coordinate of the bicycle, but is needed later for the coordinates mappings.
  • Figure 3: Scheme of the RL framework. An agent can choose an action based on the state of a dynamic environment. A numerical reward signal is passed pack to the agent.
  • Figure 4: (Left) The visualisation of the virtual bicycle model is shown. (Right) Examples of the three element types with which the paths in the present work are constructed are drawn with their corresponding geometric properties.
  • Figure 5: (Left) Four examples of randomly generated paths which are used in the learning process are shown. Short parts of the paths are drawn each to prevent clustering in the figure. (Right) The benchmark path that is used to evaluate the performance of the agents is shown. For the benchmark path, $a=5m$ applies.
  • ...and 10 more figures