Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Noel Smith; Andrzej Ruszczynski

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Noel Smith, Andrzej Ruszczynski

Abstract

We consider a stochastic optimization problem involving two random variables: a context variable $X$ and a dependent variable $Y$. The objective is to minimize the expected value of a nonlinear loss functional applied to the conditional expectation $\mathbb{E}[f(X, Y,β) \mid X]$, where $f$ is a nonlinear function and $β$ represents the decision variables. We focus on the practically important setting in which direct sampling from the conditional distribution of $Y \mid X$ is infeasible, and only a stream of i.i.d.\ observation pairs $\{(X^k, Y^k)\}_{k=0,1,2,\ldots}$ is available. In our approach, the conditional expectation is approximated within a prespecified parametric function class. We analyze a simultaneous learning-and-optimization algorithm that jointly estimates the conditional expectation and optimizes the outer objective, and establish that the method achieves a convergence rate of order $\mathcal{O}\big(1/\sqrt{N}\big)$, where $N$ denotes the number of observed pairs.

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Abstract

We consider a stochastic optimization problem involving two random variables: a context variable

and a dependent variable

. The objective is to minimize the expected value of a nonlinear loss functional applied to the conditional expectation

, where

is a nonlinear function and

represents the decision variables. We focus on the practically important setting in which direct sampling from the conditional distribution of

is infeasible, and only a stream of i.i.d.\ observation pairs

is available. In our approach, the conditional expectation is approximated within a prespecified parametric function class. We analyze a simultaneous learning-and-optimization algorithm that jointly estimates the conditional expectation and optimizes the outer objective, and establish that the method achieves a convergence rate of order

, where

denotes the number of observed pairs.

Paper Structure (4 sections, 7 theorems, 47 equations)

This paper contains 4 sections, 7 theorems, 47 equations.

Introduction
Assumptions and Basic Properties
Method
Convergence Rate Analysis

Key Result

Lemma 2.1

Under Assumptions item:Lipschitz and item:FIntegrable, the function $F(X, \cdot)$ is continuously differentiable a.s. with and $\nabla_\beta F(X, \cdot)$ is Lipschitz continuous with the constant $L_{\nabla F}(X) = \mathbb{E} \left[ L_{\nabla f}(X,Y) \mid X\right]$. Furthermore, $\mathbb{E} [ L_{\nabla F}(X)^2] \le \overline{L}_{\nabla f}^2$.

Theorems & Definitions (13)

Lemma 2.1
proof
Lemma 4.1
proof
Lemma 4.2
Lemma 4.3
proof
Lemma 4.4
Lemma 4.5
proof
...and 3 more

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Abstract

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Authors

Abstract

Table of Contents

Key Result

Theorems & Definitions (13)