Table of Contents
Fetching ...

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su

TL;DR

This work addresses online two-stage stochastic optimization with long-term constraints under uncertain and potentially non-stationary distributions. It develops two algorithmic families: Informative Adversarial Learning (IAL), which leverages machine-learned predictions to achieve a regret of $\tilde{O}(W_T+\sqrt{T})$ with $W_T$ the total prediction inaccuracy, and Doubly Adversarial Learning (DAL), which provides sublinear regret without predictions in a stationary-plus-corruption setting. The core idea is to couple primal updates with dual-variable adversarial learning, rendering the regret and constraint violations expressible in terms of the performance of embedded online learners (OGD/Hedge). The framework is validated through numerical experiments and extended to cover non-convex objectives, covering constraints, and prediction-free regimes, highlighting practical applicability in settings like supply chains and service-level management where long-term constraints are critical.

Abstract

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances.

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

TL;DR

This work addresses online two-stage stochastic optimization with long-term constraints under uncertain and potentially non-stationary distributions. It develops two algorithmic families: Informative Adversarial Learning (IAL), which leverages machine-learned predictions to achieve a regret of with the total prediction inaccuracy, and Doubly Adversarial Learning (DAL), which provides sublinear regret without predictions in a stationary-plus-corruption setting. The core idea is to couple primal updates with dual-variable adversarial learning, rendering the regret and constraint violations expressible in terms of the performance of embedded online learners (OGD/Hedge). The framework is validated through numerical experiments and extended to cover non-convex objectives, covering constraints, and prediction-free regimes, highlighting practical applicability in settings like supply chains and service-level management where long-term constraints are critical.

Abstract

We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret , where measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances.
Paper Structure (16 sections, 13 theorems, 138 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 16 sections, 13 theorems, 138 equations, 4 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

$\mathsf{OPT}\leq \mathbb{E}_{\bm{\theta}\sim\bm{P}}[\mathsf{ALG}(\pi^*, \bm{\theta})]$.

Figures (4)

  • Figure 1: Numerical results of DAL and IAL algorithms with service level (covering) constraints for the stationary case.
  • Figure 2: Numerical results of DAL and IAL algorithms with service level (covering) constraints for the non-stationary case 1.
  • Figure 3: Numerical results of DAL and IAL algorithms with service level (covering) constraints for the non-stationary case 2.
  • Figure 4: Numerical results of DAL and IAL algorithms with service level (covering) constraints for the non-stationary case 3.

Theorems & Definitions (14)

  • Lemma 1: forklore
  • Theorem 1
  • Lemma 2
  • Theorem 2
  • Theorem 3
  • Lemma 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma 4
  • ...and 4 more