Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

Jiashuo Jiang

Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

Jiashuo Jiang

TL;DR

This work tackles online two-stage stochastic optimization with long-term constraints over a finite horizon. It introduces a primal–dual online framework, namely the Doubly Adversarial Learning (DAL) algorithm, which achieves sublinear regret in stationary environments by mapping constraint satisfaction to adversarial learning via a zero-sum game between a first-stage decision and a dual constraint distribution; it also addresses robustness to adversarial corruptions and non-stationarity with predictions through the Informative Adversarial Learning (IAL) algorithm, achieving regret $\tilde{O}(\sqrt{T})+O(W_T)$ where $W_T$ captures prediction inaccuracy. The framework unifies online learning and two-stage stochastic optimization, and extends to covering constraints with appropriate dual scaling. Numerical experiments on resource allocation with packing and covering constraints show strong performance and resilience to non-stationarity, validating the theoretical regret bounds and feasibility guarantees.

Abstract

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive \textit{state-of-art} $O(\sqrt{T})$ regret that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions.

Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

TL;DR

where

captures prediction inaccuracy. The framework unifies online learning and two-stage stochastic optimization, and extends to covering constraints with appropriate dual scaling. Numerical experiments on resource allocation with packing and covering constraints show strong performance and resilience to non-stationarity, validating the theoretical regret bounds and feasibility guarantees.

Abstract

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of

periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive \textit{state-of-art}

regret that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret

, where

measures the total inaccuracy of the machine-learned predictions.

Paper Structure (18 sections, 15 theorems, 161 equations, 1 figure, 1 table, 4 algorithms)

This paper contains 18 sections, 15 theorems, 161 equations, 1 figure, 1 table, 4 algorithms.

Introduction
Our Approach and Results
Other Related Literature
Road Map
Problem Formulation
Online Algorithm for Stationary Setting
Robustness to Adversarial Corruptions
Improvement of Adversarial Setting with Predictions
Numerical Experiments
Summary
Extensions
Non-convex Objective and Non-concave Constraints
Incorporating Covering Constraints
Regret Bounds for Adversarial Learning
Missing Proofs for Section \ref{['sec:stationary']}
...and 3 more sections

Key Result

Lemma 1

$\mathsf{OPT}\leq \mathbb{E}_{\bm{\theta}\sim\bm{P}}[\mathsf{ALG}(\pi^*, \bm{\theta})]$.

Figures (1)

Figure 1: Numerical results of DAL and IAL algorithms with service level (covering) constraints.

Theorems & Definitions (16)

Lemma 1: forklore
Lemma 2
Lemma 3
Theorem 1
Theorem 2
Theorem 3
Theorem 4
Lemma 4
Theorem 5
Claim 1
...and 6 more

Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

TL;DR

Abstract

Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (16)