Reinforcement Learning-based Control of Nonlinear Systems using Carleman Approximation: Structured and Unstructured Designs
Jishnudeep Kar, He Bai, Aranya Chakrabortty
TL;DR
This work introduces reinforcement learning for unknown nonlinear, input-affine systems by embedding the dynamics into an infinite-dimensional Carleman space, yielding a bilinear lifted model suitable for policy-iteration RL. It develops both on-policy and off-policy learning algorithms, derives a Lyapunov-based stability framework, and analyzes the impact of finite-N truncation with explicit truncation-error bounds. To address practical constraints, the authors extend the framework to structured and sparse controllers using Riccati-like equations in Carleman space and ADMM-based sparsity promotion, respectively, while preserving closed-loop stability. Numerical experiments on a second-order oscillator and a four-boat tug network illustrate near-optimal performance, faster learning than NN-based approaches, and clear trade-offs between structure, sparsity, and control performance. The proposed framework offers a scalable, data-driven, stability-guaranteed path to nonlinear RL control with tunable complexity via truncation order.
Abstract
We develop data-driven reinforcement learning (RL) control designs for input-affine nonlinear systems. We use Carleman linearization to express the state-space representation of the nonlinear dynamical model in the Carleman space, and develop a real-time algorithm that can learn nonlinear state-feedback controllers using state and input measurements in the infinite-dimensional Carleman space. Thereafter, we study the practicality of having a finite-order truncation of the control signal, followed by its closed-loop stability analysis. Finally, we develop two additional designs that can learn structured as well as sparse representations of the RL-based nonlinear controller, and provide theoretical conditions for ensuring their closed-loop stability. We present numerical examples to show how our proposed method generates closed-loop responses that are close to the optimal performance of the nonlinear plant. We also compare our designs to other data-driven nonlinear RL control methods such as those based on neural networks, and illustrate their relative advantages and drawbacks.
