An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control

Niklas Persson; Feiran Zhao; Mojtaba Kaheni; Florian Dörfler; Alessandro V. Papadopoulos

An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control

Niklas Persson, Feiran Zhao, Mojtaba Kaheni, Florian Dörfler, Alessandro V. Papadopoulos

TL;DR

This paper addresses balancing a nonlinear, open-loop unstable autonomous bicycle by combining a stabilizing inner Feedback Linearization (FL) controller with an adaptive outer Data-Enabled Policy Optimization (DeePO) controller. DeePO learns and updates the LQR gains directly from online data using a covariance parameterization with exponentially weighted data and a forgetting factor, while also incorporating a robustness-promoting regularizer to obtain a stabilizing initial policy. The approach is validated through high-fidelity simulations and hardware experiments on an instrumented bicycle, showing improved tracking of the lean angle and lean rate compared to FL alone, and demonstrating the practical feasibility of data-driven adaptive control for nonlinear, time-varying systems. The work highlights that updating the policy gains at every time step is not always necessary and that DeePO can adapt to real-world nonlinearities, disturbances, and hardware limitations, with implications for autonomous bicycle control and related robotic platforms.

Abstract

This paper presents a unified control framework that integrates a Feedback Linearization (FL) controller in the inner loop with an adaptive Data-Enabled Policy Optimization (DeePO) controller in the outer loop to balance an autonomous bicycle. While the FL controller stabilizes and partially linearizes the inherently unstable and nonlinear system, its performance is compromised by unmodeled dynamics and time-varying characteristics. To overcome these limitations, the DeePO controller is introduced to enhance adaptability and robustness. The initial control policy of DeePO is obtained from a finite set of offline, persistently exciting input and state data. To improve stability and compensate for system nonlinearities and disturbances, a robustness-promoting regularizer refines the initial policy, while the adaptive section of the DeePO framework is enhanced with a forgetting factor to improve adaptation to time-varying dynamics. The proposed DeePO+FL approach is evaluated through simulations and real-world experiments on an instrumented autonomous bicycle. Results demonstrate its superiority over the FL-only approach, achieving more precise tracking of the reference lean angle and lean rate.

An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control

TL;DR

Abstract

An Adaptive Data-Enabled Policy Optimization Approach for Autonomous Bicycle Control

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (3)