A Two-Layer Framework for Joint Online Configuration Selection and Admission Control
Owen Shen, Haoran Xu, Yinyu Ye, Peter Glynn, Patrick Jaillet
TL;DR
This work addresses online joint configuration selection and admission control under budget constraints by formulating a two-layer decision process. It introduces a switching-aware fluid oracle that upper-bounds all feasible online policies via a max-min saddle-point characterization, and uses this to design SP-UCB--OLP, which achieves a regret of $\tilde{O}(\sqrt{KT})$ in a horizon of $T$ periods with $K$ configurations. The algorithm learns a mixture over configurations and a global bid price, solving an optimistic saddle-point problem at each round and applying a threshold admission rule $\mathbf{p}^T\mathbf{a}$ to decide acceptance. Empirical results on synthetic data and Alibaba traces demonstrate the necessity of the switching-aware benchmark (complementarity gap), the effectiveness of minimal exploration, and the practical scalability of the approach for real-world budgeted admission control scenarios.
Abstract
We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with $T$ periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the $K$ configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves $\tilde{O}(\sqrt{KT})$ regret.
