Table of Contents
Fetching ...

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Noel Smith, Andrzej Ruszczynski

Abstract

We consider a stochastic optimization problem involving two random variables: a context variable $X$ and a dependent variable $Y$. The objective is to minimize the expected value of a nonlinear loss functional applied to the conditional expectation $\mathbb{E}[f(X, Y,β) \mid X]$, where $f$ is a nonlinear function and $β$ represents the decision variables. We focus on the practically important setting in which direct sampling from the conditional distribution of $Y \mid X$ is infeasible, and only a stream of i.i.d.\ observation pairs $\{(X^k, Y^k)\}_{k=0,1,2,\ldots}$ is available. In our approach, the conditional expectation is approximated within a prespecified parametric function class. We analyze a simultaneous learning-and-optimization algorithm that jointly estimates the conditional expectation and optimizes the outer objective, and establish that the method achieves a convergence rate of order $\mathcal{O}\big(1/\sqrt{N}\big)$, where $N$ denotes the number of observed pairs.

Convergence Rate of a Functional Learning Method for Contextual Stochastic Optimization

Abstract

We consider a stochastic optimization problem involving two random variables: a context variable and a dependent variable . The objective is to minimize the expected value of a nonlinear loss functional applied to the conditional expectation , where is a nonlinear function and represents the decision variables. We focus on the practically important setting in which direct sampling from the conditional distribution of is infeasible, and only a stream of i.i.d.\ observation pairs is available. In our approach, the conditional expectation is approximated within a prespecified parametric function class. We analyze a simultaneous learning-and-optimization algorithm that jointly estimates the conditional expectation and optimizes the outer objective, and establish that the method achieves a convergence rate of order , where denotes the number of observed pairs.
Paper Structure (4 sections, 7 theorems, 47 equations)

This paper contains 4 sections, 7 theorems, 47 equations.

Key Result

Lemma 2.1

Under Assumptions item:Lipschitz and item:FIntegrable, the function $F(X, \cdot)$ is continuously differentiable a.s. with and $\nabla_\beta F(X, \cdot)$ is Lipschitz continuous with the constant $L_{\nabla F}(X) = \mathbb{E} \left[ L_{\nabla f}(X,Y) \mid X\right]$. Furthermore, $\mathbb{E} [ L_{\nabla F}(X)^2] \le \overline{L}_{\nabla f}^2$.

Theorems & Definitions (13)

  • Lemma 2.1
  • proof
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • Lemma 4.5
  • proof
  • ...and 3 more