Table of Contents
Fetching ...

End-to-End Learning Framework for Solving Non-Markovian Optimal Control

Xiaole Zhang, Peiyu Zhang, Xiongye Xiao, Shixuan Li, Vasileios Tzoumas, Vijay Gupta, Paul Bogdan

TL;DR

This work addresses the challenge of controlling systems with memory effects by extending the Linear Quadratic Regulator to fractional-order linear time-invariant dynamics and proposing FOLOC, an end-to-end data-driven framework. FOLOC combines a system-identification module (RNN+MLP) with a neural-operator-based optimal-control module (Fourier Neural Operator) to jointly learn system parameters $(A,B,oldsymbol{eta})$ and the optimal control policy directly from trajectories, grounded by analytical LQR solutions for FOLTI. The authors derive a discrete-time fractional-order system solution, establish sample-complexity bounds, and demonstrate robust performance across synthetic and real-world tasks (cart-pole and quadrotor) under non-Gaussian noise and limited data. The approach achieves efficient inference and shows scalability to higher dimensions, suggesting practical applicability to complex non-Markovian control problems in engineering and robotics. The work advances both theory and practice by unifying fractional-order system identification with end-to-end control under realistic noise and data constraints.

Abstract

Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.

End-to-End Learning Framework for Solving Non-Markovian Optimal Control

TL;DR

This work addresses the challenge of controlling systems with memory effects by extending the Linear Quadratic Regulator to fractional-order linear time-invariant dynamics and proposing FOLOC, an end-to-end data-driven framework. FOLOC combines a system-identification module (RNN+MLP) with a neural-operator-based optimal-control module (Fourier Neural Operator) to jointly learn system parameters and the optimal control policy directly from trajectories, grounded by analytical LQR solutions for FOLTI. The authors derive a discrete-time fractional-order system solution, establish sample-complexity bounds, and demonstrate robust performance across synthetic and real-world tasks (cart-pole and quadrotor) under non-Gaussian noise and limited data. The approach achieves efficient inference and shows scalability to higher dimensions, suggesting practical applicability to complex non-Markovian control problems in engineering and robotics. The work advances both theory and practice by unifying fractional-order system identification with end-to-end control under realistic noise and data constraints.

Abstract

Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.

Paper Structure

This paper contains 47 sections, 5 theorems, 116 equations, 5 figures, 6 tables.

Key Result

Lemma 3.1

The solution to the discrete-time FOLTI system is given by guermah2012discrete: where the matrices $G_k$ are defined recursively as: and the matrices $A_j$ are given by:

Figures (5)

  • Figure 1: Overview of the proposed model architecture. The pipeline first infers fractional-order system parameters $(A, \alpha, B)$ from input $\mathcal{X}$ using an RNN+MLP based system identification module, These parameters are then encoded as embeddings of sequential tokens $A_i$ for time-dependent modeling. An attention-based Sequence Encoder processes these embeddings to obtain latent representations, which along with cost matrices $Q$, $R$, estimated system matrix $B$ are fed to the Stack MLPs with residual connection input $\mathcal{X}$ for Fourier Neural Operator to predict optimal control signals. Finally, a Composite Loss function unifies system identification and control prediction, enabling end-to-end training of both system parameter estimation and control law synthesis.
  • Figure 2: Sample complexity bound simulation.
  • Figure 3: Numerical simulation.
  • Figure 4: Vary time horizons.
  • Figure 5: Test MSE loss under each epochs.

Theorems & Definitions (10)

  • Definition 2.1: Grünwald–Letnikov fractional-order derivative
  • Definition 2.2: LQR for FOLTI systems
  • Lemma 3.1: Discrete-time FOLTI system solution
  • Theorem 3.2: LQR solution for FOLTI systems
  • Theorem 3.3: Sample Complexity for FOLTI systems
  • Corollary 3.4: Simplified Sample Complexity for FOLTI systems
  • Definition 3.1: Riemann-Liouville fractional-order integral
  • Definition 3.2: Riemann–Liouville fractional-order derivative
  • Lemma 4.1: Lagrange multiplier condition
  • Definition 4.2: Block Toeplitz matrix