Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning
Jiashuo Jiang
TL;DR
This work tackles online two-stage stochastic optimization with long-term constraints over a finite horizon. It introduces a primal–dual online framework, namely the Doubly Adversarial Learning (DAL) algorithm, which achieves sublinear regret in stationary environments by mapping constraint satisfaction to adversarial learning via a zero-sum game between a first-stage decision and a dual constraint distribution; it also addresses robustness to adversarial corruptions and non-stationarity with predictions through the Informative Adversarial Learning (IAL) algorithm, achieving regret $\tilde{O}(\sqrt{T})+O(W_T)$ where $W_T$ captures prediction inaccuracy. The framework unifies online learning and two-stage stochastic optimization, and extends to covering constraints with appropriate dual scaling. Numerical experiments on resource allocation with packing and covering constraints show strong performance and resilience to non-stationarity, validating the theoretical regret bounds and feasibility guarantees.
Abstract
We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the long-term average second-stage decision belongs to a set. We develop online algorithms for the online two-stage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive \textit{state-of-art} $O(\sqrt{T})$ regret that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown non-stationary distributions and we are given machine-learned predictions of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the machine-learned predictions.
