Bandit-Based Rate Adaptation for a Single-Server Queue
Authors
Mevan Wijewardena, Kamiar Asgari, Michael J. Neely
Abstract
This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot the transmitter chooses a rate from a continuous interval. Transmission succeeds if and only if , where channel capacities and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of , achieves a time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether is known, there exists an environment in which the worst-case time-average expected queue size is . Thus, while a gap remains in the setting without knowledge of , we show that if is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves , matching the converse up to logarithmic factors.