Convergence of Neural Network Policies for Risk--Reward Optimization

Chang Chen; Duy-Minh Dang

Convergence of Neural Network Policies for Risk--Reward Optimization

Chang Chen, Duy-Minh Dang

TL;DR

Under mild regularity conditions, it is proved that the empirical optimum of the NN-parametrized objective converges in probability to the true optimal value as network capacity and training sample size increase.

Abstract

We develop a neural-network framework for multi-period risk--reward stochastic control problems with constrained two-step feedback policies that may be discontinuous in the state. We allow a broad class of objectives built on a finite-dimensional performance vector, including terminal and path-dependent statistics, with risk functionals admitting auxiliary-variable optimization representations (e.g.\ Conditional Value-at-Risk and buffered probability of exceedance) and optional moment dependence. Our approach parametrizes the two-step policy using two coupled feedforward networks with constraint-enforcing output layers, reducing the constrained control problem to unconstrained training over network parameters. Under mild regularity conditions, we prove that the empirical optimum of the NN-parametrized objective converges in probability to the true optimal value as network capacity and training sample size increase. The proof is modular, separating policy approximation, propagation through the controlled recursion, and preservation under the scalarized risk--reward objective. Numerical experiments confirm the predicted convergence-in-probability behavior, show close agreement between learned and reference control heat maps, and demonstrate out-of-sample robustness on a large independent scenario set.

Convergence of Neural Network Policies for Risk--Reward Optimization

TL;DR

Abstract

Paper Structure (52 sections, 13 theorems, 119 equations, 3 figures, 8 tables)

This paper contains 52 sections, 13 theorems, 119 equations, 3 figures, 8 tables.

Introduction
Problem formulation
Probability space and intervention times
Exogenous input process
State, admissible controls, and controlled dynamics
Risk--reward objective
Reward
Risk
Scalarized risk--reward objective
Standing assumptions for the risk--reward control problem
Discussion of Assumption \ref{['ass:rr_regularity']}
A neural network approach
Preliminaries
Pre-decision network
Post-decision network
...and 37 more sections

Key Result

Theorem 3.1

Let $X$ be an $\mathbb{R}^{\nu_0}$-valued random variable and let $f:\mathbb{R}^{\nu_0}\to\mathbb{R}^{d}$ be Borel measurable. Then there exists a sequence $\{F_{n}\}_{n\in\mathbb{N}}$, where $F_{n}=F(\cdot;\theta_{n})\in \mathcal{Q}_{n}$, such that for all $\varepsilon>0$,

Figures (3)

Figure 5.1: Empirical optima $\widehat{V}^{(j)}_{n,K}$ across $N_{\mathrm{run}}=100$ runs. Boxes show the interquartile range (25%--75%) with median line; whiskers extend to $1.5\times\mathrm{IQR}$ and points beyond are plotted as outliers. The dashed line indicates the reference value $V_{\mathrm{ref}}=1605.22$.
Figure 5.2: Policy heatmap comparison.
Figure 5.3: Withdrawal slice at $t=15$ years.

Theorems & Definitions (23)

Remark 2.1: Well-posedness of the controlled recursion
Remark 2.2: Pre-commitment vs. time-consistent formulations
Definition 3.1: Feedforward neural network
Theorem 3.1: Universal approximation for a random input hornik1989multilayer
Lemma 3.2: Composition with (a.s.-continuous) activations
Lemma 3.3: Boundary approximation via open-range activations
Theorem 3.4
Theorem 3.5
Remark 3.6: Training algorithm
Lemma 4.1
...and 13 more

Convergence of Neural Network Policies for Risk--Reward Optimization

TL;DR

Abstract

Convergence of Neural Network Policies for Risk--Reward Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (23)