CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Gelu Liu; Teng Wang; Zhijie Wu; Junliang Wu; Songyuan Li; Xiangwei Zhu

CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Gelu Liu, Teng Wang, Zhijie Wu, Junliang Wu, Songyuan Li, Xiangwei Zhu

Abstract

Autonomous bicycles offer a promising agile solution for urban mobility and last-mile logistics, however, conventional control strategies often struggle with their underactuated nonlinear dynamics, suffering from sensitivity to model mismatches and limited adaptability to real-world uncertainties. To address this, this paper presents CycleRL, the first sim-to-real deep reinforcement learning framework designed for robust autonomous bicycle control. Our approach trains an end-to-end neural control policy within the high-fidelity NVIDIA Isaac Sim environment, leveraging Proximal Policy Optimization (PPO) to circumvent the need for an explicit dynamics model. The framework features a composite reward function tailored for concurrent balance maintenance, velocity tracking, and steering control. Crucially, systematic domain randomization is employed to bridge the simulation-to-reality gap and facilitate direct transfer. In simulation, CycleRL achieves considerable performance, including a 99.90% balance success rate, a low steering tracking error of 1.15°, and a velocity tracking error of 0.18 m/s. These quantitative results, coupled with successful hardware transfer, validate DRL as an effective paradigm for autonomous bicycle control, offering superior adaptability over traditional methods. Video demonstrations are available at https://anony6f05.github.io/CycleRL/.

CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Abstract

Paper Structure (35 sections, 9 equations, 7 figures, 9 tables)

This paper contains 35 sections, 9 equations, 7 figures, 9 tables.

Introduction
Related Work
Traditional Control Methods for Two-Wheeled Vehicles
Learning-Based Applications in Bicycle Control
Simulation-to-Reality Transfer Technologies
Methodology
Nonlinear Dynamics and Uncertainty Analysis
Two-Wheeled Vehicle Dynamics
Stochastic Dynamics Formulation
Reinforcement Learning Framework
Markov Decision Process Formulation
Composite Reward Function Design
Proximal Policy Optimization with Composite Reward
Domain Randomization Strategy
Dynamics Randomization
...and 20 more sections

Figures (7)

Figure 1: Real world deployment of the proposed autonomous bicycle control framework across different conditions.
Figure 2: Sketch of the two-wheeled vehicle persson2021trajectory.
Figure 3: Illustration of the reward function design for balancing and steering control. The total reward aggregates performance incentives (green arrows) and control effort penalties (red arrows) into a unified scalar to guide the policy learning.
Figure 4: Bicycle and terrain modeling in Isaac Sim.
Figure 5: Training curves and convergence analysis.
...and 2 more figures

CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Abstract

CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

Authors

Abstract

Table of Contents

Figures (7)