Table of Contents
Fetching ...

Lyapunov-based reinforcement learning for distributed control with stability guarantee

Jingshi Yao, Minghao Han, Xunyuan Yin

TL;DR

This work tackles stability-guaranteed distributed control of stochastic nonlinear systems by marrying Lyapunov theory with model-free reinforcement learning. It introduces the Distributed Lyapunov Actor-Critic (DLAC) framework, where each subsystem maintains a local critic and a Gaussian policy, guided by Lyapunov functions L^i to ensure mean-cost stability across the interconnected MDP. Theoretical conditions establish a sufficient Lyapunov-based criterion for stability, while the training procedure uses distributed critic training and collaborative but minimal-information-exchange actor updates, yielding decentralized execution. Empirical results on a three-tank chemical process show convergence, robustness to disturbances, and effective tracking of multiple references, with DLAC outperforming open-loop control and competing with NMPC without requiring first-principles models. The approach offers a scalable, data-driven avenue for stable distributed control in complex industrial systems, with potential extensions to sequential communications and POMDP settings.

Abstract

In this paper, we propose a Lyapunov-based reinforcement learning method for distributed control of nonlinear systems comprising interacting subsystems with guaranteed closed-loop stability. Specifically, we conduct a detailed stability analysis and derive sufficient conditions that ensure closed-loop stability under a model-free distributed control scheme based on the Lyapunov theorem. The Lyapunov-based conditions are leveraged to guide the design of local reinforcement learning control policies for each subsystem. The local controllers only exchange scalar-valued information during the training phase, yet they do not need to communicate once the training is completed and the controllers are implemented online. The effectiveness and performance of the proposed method are evaluated using a benchmark chemical process that contains two reactors and one separator.

Lyapunov-based reinforcement learning for distributed control with stability guarantee

TL;DR

This work tackles stability-guaranteed distributed control of stochastic nonlinear systems by marrying Lyapunov theory with model-free reinforcement learning. It introduces the Distributed Lyapunov Actor-Critic (DLAC) framework, where each subsystem maintains a local critic and a Gaussian policy, guided by Lyapunov functions L^i to ensure mean-cost stability across the interconnected MDP. Theoretical conditions establish a sufficient Lyapunov-based criterion for stability, while the training procedure uses distributed critic training and collaborative but minimal-information-exchange actor updates, yielding decentralized execution. Empirical results on a three-tank chemical process show convergence, robustness to disturbances, and effective tracking of multiple references, with DLAC outperforming open-loop control and competing with NMPC without requiring first-principles models. The approach offers a scalable, data-driven avenue for stable distributed control in complex industrial systems, with potential extensions to sequential communications and POMDP settings.

Abstract

In this paper, we propose a Lyapunov-based reinforcement learning method for distributed control of nonlinear systems comprising interacting subsystems with guaranteed closed-loop stability. Specifically, we conduct a detailed stability analysis and derive sufficient conditions that ensure closed-loop stability under a model-free distributed control scheme based on the Lyapunov theorem. The Lyapunov-based conditions are leveraged to guide the design of local reinforcement learning control policies for each subsystem. The local controllers only exchange scalar-valued information during the training phase, yet they do not need to communicate once the training is completed and the controllers are implemented online. The effectiveness and performance of the proposed method are evaluated using a benchmark chemical process that contains two reactors and one separator.

Paper Structure

This paper contains 18 sections, 1 theorem, 27 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let Assumption assumption2 hold. The MDP $\mathcal{M}$ controlled by the distributed control policy $\pi_d$ is stable in mean cost if there exist positive constants $\alpha_{1}$, $\alpha_{2}$, and $\alpha_{3}$, and a set of Lyapunov functions $L^i:\mathcal{S}^i \to\mathbb{R}^+$ for each $i\in\mathca where $u_{\pi_d}(s) \triangleq \lim_{N\to\infty}\frac{1}{N}\sum^{N}_{t=0}P(s|\rho, \pi_d, t)$ is th

Figures (8)

  • Figure 1: A graphic illustration of the distributed training paradigm.
  • Figure 2: A schematic representation of the chemical process containing two reactors and a separator.
  • Figure 3: Evaluation of convergence over episodes during training. Figure \ref{['fig:costs']} displays the accumulated costs of DLAC, while Figure \ref{['fig:loss']} shows the critic losses. DLAC was trained five times with random initial states. The gray dashed line marks the start of training, the solid lines represent the mean values, and the shaded areas indicate the variance across the five training trials.
  • Figure 4: Evaluation of the controller tracking performance during training. Figure \ref{['fig:test2']} shows the maximum static state tracking error for the variables representing temperatures; Figure \ref{['fig:test3']} shows the maximum static state tracking error for the variables representing mass fractions. DLAC is trained 5 times with random initial states. The solid lines represent the mean values, and the shaded areas indicate the variance across the 5 training trials.
  • Figure 5: State trajectories under the process disturbances with standard deviation $\sigma_{w1}$, where the red lines represent a reference state randomly selected from the reference set $\mathcal{S}_{ref}$, the solid blue lines depict the mean of the 100 simulated trajectories with random initial states, the light blue shaded areas represent one standard deviation from the mean, and the magnified sections within each subplot provide a closer look at the critical points.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1
  • Remark 1
  • Theorem 1
  • proof
  • Remark 2
  • Remark 3