Bandit-Based Rate Adaptation for a Single-Server Queue

Mevan Wijewardena; Kamiar Asgari; Michael J. Neely

Paper

Bandit-Based Rate Adaptation for a Single-Server Queue

Abstract

This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot

the transmitter chooses a rate

from a continuous interval. Transmission succeeds if and only if

, where channel capacities

and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let

denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of

, achieves a

time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether

is known, there exists an environment in which the worst-case time-average expected queue size is

. Thus, while a gap remains in the setting without knowledge of

, we show that if

is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves

, matching the converse up to logarithmic factors.