Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions
Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su
TL;DR
This work addresses online two-stage stochastic optimization with long-term constraints under uncertain and potentially non-stationary distributions. It develops two algorithmic families: Informative Adversarial Learning (IAL), which leverages machine-learned predictions to achieve a regret of $\tilde{O}(W_T+\sqrt{T})$ with $W_T$ the total prediction inaccuracy, and Doubly Adversarial Learning (DAL), which provides sublinear regret without predictions in a stationary-plus-corruption setting. The core idea is to couple primal updates with dual-variable adversarial learning, rendering the regret and constraint violations expressible in terms of the performance of embedded online learners (OGD/Hedge). The framework is validated through numerical experiments and extended to cover non-convex objectives, covering constraints, and prediction-free regimes, highlighting practical applicability in settings like supply chains and service-level management where long-term constraints are critical.
Abstract
We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm can be reduced to the regret bound of embedded adversarial learning algorithms. Based on this framework, we obtain new results under various settings. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions. We then develop another algorithm that works when no machine-learned predictions are given and show the performances.
