Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

George V. Moustakides

Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

George V. Moustakides

TL;DR

The paper tackles data-driven estimation of conditional expectations when conditional densities are unknown, proposing a unified optimization framework that identifies the optimal mapping via $\omega(u_o(X))=\frac{b(X)}{a(X)}$ and extends this to systems of nonlinear equations. It develops neural-network based estimators to solve for $u(X)$ (and its variants) directly from data, using GD/SGD/ADAM and empirical costs that replace expectations. The authors show how to identify likelihood ratios, conditional-density ratios, and conditional expectations within the same framework, and demonstrate applications to Markov optimal stopping and reinforcement learning, including numerical and data-driven solutions. The framework provides a practical, data-driven route to solving stochastic optimization problems without explicit density estimation, enabling scalable learning in complex sequential decision problems.

Abstract

When the underlying conditional density is known, conditional expectations can be computed analytically or numerically. When, however, such knowledge is not available and instead we are given a collection of training data, the goal of this work is to propose simple and purely data-driven means for estimating directly the desired conditional expectation. Because conditional expectations appear in the description of a number of stochastic optimization problems with the corresponding optimal solution satisfying a system of nonlinear equations, we extend our data-driven method to cover such cases as well. We test our methodology by applying it to Optimal Stopping and Optimal Action Policy in Reinforcement Learning.

Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

TL;DR

The paper tackles data-driven estimation of conditional expectations when conditional densities are unknown, proposing a unified optimization framework that identifies the optimal mapping via

and extends this to systems of nonlinear equations. It develops neural-network based estimators to solve for

(and its variants) directly from data, using GD/SGD/ADAM and empirical costs that replace expectations. The authors show how to identify likelihood ratios, conditional-density ratios, and conditional expectations within the same framework, and demonstrate applications to Markov optimal stopping and reinforcement learning, including numerical and data-driven solutions. The framework provides a practical, data-driven route to solving stochastic optimization problems without explicit density estimation, enabling scalable learning in complex sequential decision problems.

Abstract

Paper Structure (20 sections, 1 theorem, 66 equations, 3 figures)

This paper contains 20 sections, 1 theorem, 66 equations, 3 figures.

Introduction
A General Optimization Problem
Versions of the Optimization Problem
Identification of Likelihood Ratios
Identification of Ratio of Conditional Densities
Identification of Conditional Expectations
Examples of functions $\omega(z),\rho(z),\phi(z),\psi(z)$
Data-Driven Estimation
Data-Driven Likelihood Ratio Estimation
Data-Driven Estimation of Conditional Expectations
Numerical Computation of Conditional Expectations
Examples
System of Equations involving Conditional Expectations
Numerical Solution
Data-Driven Solution
...and 5 more sections

Key Result

Theorem 1

For real $z$ let $\omega(z)$ be a strictly increasing scalar function and denote with $\mathcal{I}$ its range of values. Select a second function $\rho(z)$ which is strictly negative and define two additional scalar functions $\phi(z),\psi(z)$ through their derivatives If $\mathrm{range}(\frac{\mathsf{b}(X)}{\mathsf{a}(X)})\subseteq \mathcal{I}$ then the optimal solution $\mathsf{u_o}(X)$ of the

Figures (3)

Figure 1: Estimation of conditional expectation ${\mathsf{E}}_{\EuScript{Y}}[\EuScript{Y}|\EuScript{X}=X]$ when (a) $\EuScript{Y}=\text{sign}(\EuScript{X})\,\EuScript{X}^2+\EuScript{W}$ and (b) $\EuScript{Y}=\mathbbm{1}_{[-1,1]}(\EuScript{X}+\EuScript{W})$ with $n=200$ training samples and using a shallow network with hidden layer of size 50.
Figure 2: (a) Numerical and Data-Driven estimation of the optimal average cost with $n=500$ samples, using a shallow networks with hidden layer of size 100 nodes. (b) The training set with 500 consecutive realization of the Markov process $\{\EuScript{X}_t\}$.
Figure 3: (a) Numerical and Data-Driven estimation of the two expected optimal reward function $\EuScript{U}^1(S),\EuScript{U}^2(S)$ with $n=500$ average number of samples per model, using two shallow networks with hidden layer of size 100 nodes. (b) Evolution of costs as a function of the number of iterations.

Theorems & Definitions (1)

Theorem 1

Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

TL;DR

Abstract

Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)