A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

Cheng Yin; Yi Chen

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

Cheng Yin, Yi Chen

TL;DR

The paper tackles optimal control for systems with stochastic dynamics described by $\\mathcal{SDE}$s, focusing on a child-mother system and the challenge of incorporating stochasticity into value-function estimation. It introduces the $\\mathcal{Y}$ operator, proven equivalent to the Itô generator $\\mathcal{A}$, to transform the time derivative of value-function functionals into partial derivatives of the drift and diffusion terms, enabling the YORL framework. The method integrates the operator into a Critic loss and uses PPO-style Actor updates, augmented by a data-driven NSDE calibration step. Across linear and nonlinear NSDE experiments, YORL outperforms traditional TSRL in convergence and final reward, with flexibility in activation choices and applicability to offline/IRL scenarios, highlighting practical impact for stochastic control with neural RL.

Abstract

This paper introduces a novel operator, termed the Y operator, to elevate control performance in Actor-Critic(AC) based reinforcement learning for systems governed by stochastic differential equations(SDEs). The Y operator ingeniously integrates the stochasticity of a class of child-mother system into the Critic network's loss function, yielding substantial advancements in the control performance of RL algorithms.Additionally, the Y operator elegantly reformulates the challenge of solving partial differential equations for the state-value function into a parallel problem for the drift and diffusion functions within the system's SDEs.A rigorous mathematical proof confirms the operator's validity.This transformation enables the Y Operator-based Reinforcement Learning(YORL) framework to efficiently tackle optimal control problems in both model-based and data-driven systems.The superiority of YORL is demonstrated through linear and nonlinear numerical examples showing its enhanced performance over existing methods post convergence.

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

TL;DR

The paper tackles optimal control for systems with stochastic dynamics described by

s, focusing on a child-mother system and the challenge of incorporating stochasticity into value-function estimation. It introduces the

operator, proven equivalent to the Itô generator

, to transform the time derivative of value-function functionals into partial derivatives of the drift and diffusion terms, enabling the YORL framework. The method integrates the operator into a Critic loss and uses PPO-style Actor updates, augmented by a data-driven NSDE calibration step. Across linear and nonlinear NSDE experiments, YORL outperforms traditional TSRL in convergence and final reward, with flexibility in activation choices and applicability to offline/IRL scenarios, highlighting practical impact for stochastic control with neural RL.

Abstract

Paper Structure (16 sections, 8 theorems, 80 equations, 2 figures, 1 table)

This paper contains 16 sections, 8 theorems, 80 equations, 2 figures, 1 table.

INTRODUCTION
PRELIMINARIES
THEORY
SYSTEM MODELING FOR A CLASS OF CHILD-MOTHER SYSTEM
A Class Of Child-Mother System Modeling
Stochastic Differential Equation Calibration of Child-Mother System
RL DESIGN BASED ON $\mathcal{Y}$ OPERATOR IN CHILD-MOTHER SYSTEM
Critic Network Design
Actor Network Design
ILLUSTRATIVE EXAMPLES
Linear Numerical Examples
Nonlinear Numerical Examples
CONCLUSIONS
APPENDIX
The Proof of Proposition \ref{['theory proposition1']}
...and 1 more sections

Key Result

Lemma 1

{evans2012introductionsection5.2: Existence and uniqueness of solution of stochastic differential equation} Support that $h:\mathbb{R}^n\times[0,T]\rightarrow\mathbb{R}^n$ and $H:\mathbb{R}^n\times[0,T]\rightarrow\mathbb{R}^{n\times m}$ are continuous and satisfy the following conditions: Let $X_0$ be any $\mathbb{R}^n$-valued random variable such that and where $W(\cdot)$ is a given $m$-dimens

Figures (2)

Figure 1: In subfigure (a), the hidden layer dimension of both the Actor and Critic networks for both TSRL and YORL is $32$ and the activation function used is the Sigmoid function. In subfigure (b), the hidden layer dimension is $128$ and the activation function used is Sigmoid. In subfigure (c), the hidden layer dimension is $32$ and the activation function used is Relu. In subfigure (d), the hidden layer dimension is $32$ and the activation function used is tanh.
Figure 2: In subfigure (a), the hidden layer dimension of both the Actor and Critic networks for both TSRL and YORL is $32$ and the activation function used is the Sigmoid function. In subfigure (b), the hidden layer dimension is $128$ and the activation function used is Sigmoid. In subfigure (c), the hidden layer dimension is $32$ and the activation function used is Relu. In subfigure (d), the hidden layer dimension is $32$ and the activation function used is tanh.

Theorems & Definitions (17)

Definition 1
Definition 2
Definition 3
Definition 4
Lemma 1
Lemma 2
Definition 5
Corollary 1
Corollary 2
Definition 6
...and 7 more

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

TL;DR

Abstract

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (17)