A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

Owen Shen; Haoran Xu; Yinyu Ye; Peter Glynn; Patrick Jaillet

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

Owen Shen, Haoran Xu, Yinyu Ye, Peter Glynn, Patrick Jaillet

TL;DR

This work addresses online joint configuration selection and admission control under budget constraints by formulating a two-layer decision process. It introduces a switching-aware fluid oracle that upper-bounds all feasible online policies via a max-min saddle-point characterization, and uses this to design SP-UCB--OLP, which achieves a regret of $\tilde{O}(\sqrt{KT})$ in a horizon of $T$ periods with $K$ configurations. The algorithm learns a mixture over configurations and a global bid price, solving an optimistic saddle-point problem at each round and applying a threshold admission rule $\mathbf{p}^T\mathbf{a}$ to decide acceptance. Empirical results on synthetic data and Alibaba traces demonstrate the necessity of the switching-aware benchmark (complementarity gap), the effectiveness of minimal exploration, and the practical scalability of the approach for real-world budgeted admission control scenarios.

Abstract

We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with $T$ periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the $K$ configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves $\tilde{O}(\sqrt{KT})$ regret.

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

TL;DR

in a horizon of

periods with

configurations. The algorithm learns a mixture over configurations and a global bid price, solving an optimistic saddle-point problem at each round and applying a threshold admission rule

to decide acceptance. Empirical results on synthetic data and Alibaba traces demonstrate the necessity of the switching-aware benchmark (complementarity gap), the effectiveness of minimal exploration, and the practical scalability of the approach for real-world budgeted admission control scenarios.

Abstract

We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with

periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the

configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves

regret.

Paper Structure (91 sections, 33 theorems, 174 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 91 sections, 33 theorems, 174 equations, 3 figures, 4 tables, 2 algorithms.

Introduction
Main Contributions.
Related Work
Bandits with knapsacks (BwK).
Online linear programming (OLP) and threshold-based decision rule.
Saddle-point of max-min problem.
Our positioning.
Model
Two-layer decision model.
Assumptions
Switching-Aware Fluid Oracle
Fixed configuration oracle leads to negative regret
Primal Mixed Fluid Relaxation
Dual Form and Envelope Structure
Envelope and threshold consumption.
...and 76 more sections

Key Result

Theorem 5

Under Assumptions assump:bounded and assump:budget-scaling, let $b_{min}$ be the smallest component of $\mathbf{b}$ (in particular, $b_{\min} > 0$) and $P_{max}$ be $2R_{max}/b_{min}$. With price domain $\mathcal{P} = [0, P_{\max}]^d$,

Figures (3)

Figure 1: Two-layer decision structure with data revelation.
Figure 2: Regret scaling with $\alpha = 1.5$ (S0, 50 seeds).
Figure 3: Alibaba traces: competitive ratio across 50 seeds ($T=5{,}000$, $\rho=1.0$). Greedy ($\alpha=0$) exhibits high variance due to regime lock-in; minimal exploration ($\alpha=0.01$) dramatically stabilizes performance.

Theorems & Definitions (71)

Example 1: Complementary Resources
Theorem 5: Primal--Dual Form of the Mixed Fluid Oracle
Theorem 6: Primal--dual optimality (saddle/KKT conditions)
Theorem 7: Characterization of all saddle points
Theorem 8: Switching-Aware Oracle Upper Bound
proof : Proof sketch
Remark 1
Theorem 9: Main Theorem: Regret vs. Switching-Aware Fluid Oracle
Remark 2
Remark 3: Tie-Handling Convention
...and 61 more

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

TL;DR

Abstract

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (71)