Table of Contents
Fetching ...

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

Owen Shen, Haoran Xu, Yinyu Ye, Peter Glynn, Patrick Jaillet

TL;DR

This work addresses online joint configuration selection and admission control under budget constraints by formulating a two-layer decision process. It introduces a switching-aware fluid oracle that upper-bounds all feasible online policies via a max-min saddle-point characterization, and uses this to design SP-UCB--OLP, which achieves a regret of $\tilde{O}(\sqrt{KT})$ in a horizon of $T$ periods with $K$ configurations. The algorithm learns a mixture over configurations and a global bid price, solving an optimistic saddle-point problem at each round and applying a threshold admission rule $\mathbf{p}^T\mathbf{a}$ to decide acceptance. Empirical results on synthetic data and Alibaba traces demonstrate the necessity of the switching-aware benchmark (complementarity gap), the effectiveness of minimal exploration, and the practical scalability of the approach for real-world budgeted admission control scenarios.

Abstract

We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with $T$ periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the $K$ configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves $\tilde{O}(\sqrt{KT})$ regret.

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

TL;DR

This work addresses online joint configuration selection and admission control under budget constraints by formulating a two-layer decision process. It introduces a switching-aware fluid oracle that upper-bounds all feasible online policies via a max-min saddle-point characterization, and uses this to design SP-UCB--OLP, which achieves a regret of in a horizon of periods with configurations. The algorithm learns a mixture over configurations and a global bid price, solving an optimistic saddle-point problem at each round and applying a threshold admission rule to decide acceptance. Empirical results on synthetic data and Alibaba traces demonstrate the necessity of the switching-aware benchmark (complementarity gap), the effectiveness of minimal exploration, and the practical scalability of the approach for real-world budgeted admission control scenarios.

Abstract

We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves regret.
Paper Structure (91 sections, 33 theorems, 174 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 91 sections, 33 theorems, 174 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Theorem 5

Under Assumptions assump:bounded and assump:budget-scaling, let $b_{min}$ be the smallest component of $\mathbf{b}$ (in particular, $b_{\min} > 0$) and $P_{max}$ be $2R_{max}/b_{min}$. With price domain $\mathcal{P} = [0, P_{\max}]^d$,

Figures (3)

  • Figure 1: Two-layer decision structure with data revelation.
  • Figure 2: Regret scaling with $\alpha = 1.5$ (S0, 50 seeds).
  • Figure 3: Alibaba traces: competitive ratio across 50 seeds ($T=5{,}000$, $\rho=1.0$). Greedy ($\alpha=0$) exhibits high variance due to regime lock-in; minimal exploration ($\alpha=0.01$) dramatically stabilizes performance.

Theorems & Definitions (71)

  • Example 1: Complementary Resources
  • Theorem 5: Primal--Dual Form of the Mixed Fluid Oracle
  • Theorem 6: Primal--dual optimality (saddle/KKT conditions)
  • Theorem 7: Characterization of all saddle points
  • Theorem 8: Switching-Aware Oracle Upper Bound
  • proof : Proof sketch
  • Remark 1
  • Theorem 9: Main Theorem: Regret vs. Switching-Aware Fluid Oracle
  • Remark 2
  • Remark 3: Tie-Handling Convention
  • ...and 61 more