Table of Contents
Fetching ...

A Model-Free Data-Driven Algorithm for Continuous-Time Control

Sean R. Bowerfind, Matthew R. Kirchner, Gary A. Hewer, D. Reed Robinson, Paula Chen, Alireza Farahmandi, Katia Estabridis

TL;DR

This work tackles model-free synthesis of infinite-horizon LQR controllers for continuous-time systems using only finite, noisy input-output data. It derives a trajectory-based necessary condition on the value function along observed trajectories, yielding an implicit, model-free formulation that avoids explicit knowledge of $A$ and $B$, and casts the problem as an NLP over $L$ with $P=L^T L$ and $S=PB$ to obtain the stabilizing gain $K=R^{-1}S^T$. The approach is shown to be equivalent to a continuous-time Q-learning perspective, with discrete-time implementation offering improved numerical stability. Two case studies—a known linear Boeing-747–style system and an unknown nonlinear quadcopter—demonstrate that the model-free gains closely approximate the classical LQR gains and deliver comparable closed-loop performance under data-driven conditions. This provides a data-driven, offline alternative for LQR design suitable for systems where first-principles models are unavailable or hard to linearize, with future work focusing on data-design metrics and real-flight validation.

Abstract

Presented is an algorithm to synthesize an infinite-horizon LQR optimal feedback controller for continuous-time systems. The algorithm does not require knowledge of the system dynamics, but instead uses only a finite-length sampling of (possibly suboptimal) input-output data. The algorithm is based on a constrained optimization problem that enforces a necessary condition on the dynamics of the optimal value function along an arbitrary trajectory. This paper presents the derivation as well as shows examples applied to both linear and nonlinear systems inspired by air vehicles.

A Model-Free Data-Driven Algorithm for Continuous-Time Control

TL;DR

This work tackles model-free synthesis of infinite-horizon LQR controllers for continuous-time systems using only finite, noisy input-output data. It derives a trajectory-based necessary condition on the value function along observed trajectories, yielding an implicit, model-free formulation that avoids explicit knowledge of and , and casts the problem as an NLP over with and to obtain the stabilizing gain . The approach is shown to be equivalent to a continuous-time Q-learning perspective, with discrete-time implementation offering improved numerical stability. Two case studies—a known linear Boeing-747–style system and an unknown nonlinear quadcopter—demonstrate that the model-free gains closely approximate the classical LQR gains and deliver comparable closed-loop performance under data-driven conditions. This provides a data-driven, offline alternative for LQR design suitable for systems where first-principles models are unavailable or hard to linearize, with future work focusing on data-design metrics and real-flight validation.

Abstract

Presented is an algorithm to synthesize an infinite-horizon LQR optimal feedback controller for continuous-time systems. The algorithm does not require knowledge of the system dynamics, but instead uses only a finite-length sampling of (possibly suboptimal) input-output data. The algorithm is based on a constrained optimization problem that enforces a necessary condition on the dynamics of the optimal value function along an arbitrary trajectory. This paper presents the derivation as well as shows examples applied to both linear and nonlinear systems inspired by air vehicles.

Paper Structure

This paper contains 10 sections, 2 theorems, 59 equations, 5 figures, 1 table.

Key Result

Lemma 5.1

Suppose $P$ satisfies the algebraic Riccati equation. Then the following advantage function satisfies eq:Q semi-group$:$ where $\|\cdot\|_{R}=\sqrt{\langle\cdot,R\cdot\rangle}$ denotes the norm induced by the symmetric positive definite matrix $R$.

Figures (5)

  • Figure 1: The testing paradigm for Example 1.
  • Figure 2: The state response after implementation of a B-747 roll command tracking. The ground truth is generated with $K_{LQR}$ and is shown with the dashed red line. The response when using the controller, $K_{MF}$, computed with the proposed model-free method is shown in blue. Best viewed in color.
  • Figure 3: The depiction of the body coordinate frame and dimensions of the Holybro X500 V2 quadcopter for which the Simulink model used for example 2 in Section \ref{['sec:results nonlinear']} is based on.
  • Figure 4: Joystick generated manual control inputs, in the pitch(below) and roll (above) axes.
  • Figure 5: The state response after implementation of an attitude command tracking on a quadcopter drone. The ground truth is generated with $K_{LQR}$ and is shown with the dashed red line. The response when using the controller, $K_{MF}$, computed with the proposed model-free method is shown in blue. Best viewed in color.

Theorems & Definitions (4)

  • Lemma 5.1
  • Lemma 1
  • proof
  • proof