Table of Contents
Fetching ...

Drift Plus Optimistic Penalty: A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds

Sathwik Chadaga, Eytan Modiano

TL;DR

This work addresses joint routing and scheduling in multi-hop queueing networks with unknown edge costs and partial feedback. It fuses Lyapunov drift-plus-penalty optimization with optimistic cost estimates to form the Drift Plus Optimistic Penalty (DPOP) policy, enabling simultaneous stability and cost minimization. A static lower-bound problem provides a benchmark, and the authors prove a regret bound of $O(\sqrt{T}\log T)$, demonstrating that learning the costs yields sub-linear regret while maintaining throughput. Simulations on single- and multi-commodity networks validate the sub-linear regret and stable backlog, showing the policy learns near-optimal routes and costs, with the oracle policy serving as a performance reference. The results have practical implications for online resource allocation in networks where costs are unknown or dynamic, offering a principled method to balance throughput and financial efficiency.

Abstract

We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.

Drift Plus Optimistic Penalty: A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds

TL;DR

This work addresses joint routing and scheduling in multi-hop queueing networks with unknown edge costs and partial feedback. It fuses Lyapunov drift-plus-penalty optimization with optimistic cost estimates to form the Drift Plus Optimistic Penalty (DPOP) policy, enabling simultaneous stability and cost minimization. A static lower-bound problem provides a benchmark, and the authors prove a regret bound of , demonstrating that learning the costs yields sub-linear regret while maintaining throughput. Simulations on single- and multi-commodity networks validate the sub-linear regret and stable backlog, showing the policy learns near-optimal routes and costs, with the oracle policy serving as a performance reference. The results have practical implications for online resource allocation in networks where costs are unknown or dynamic, offering a principled method to balance throughput and financial efficiency.

Abstract

We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order , as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.

Paper Structure

This paper contains 23 sections, 8 theorems, 77 equations, 14 figures, 1 algorithm.

Key Result

Theorem 1

(Static Lower Bound) There exists a finite constant $C_L$ that is only a function of the network topology and transmission costs, such that, for $C_B \geq C_L$, we have

Figures (14)

  • Figure 1: Single-commodity network showing $(\mu_{ij}^{max}, c_{ij})$.
  • Figure 2: Transmission cost $\sum_{(i,j)\in \mathcal{E}}\mathop{\mathrm{\mathbb{E}}}\nolimits[\mu_{ij}^{\pi}(t)]c_{ij}$.
  • Figure 3: Total queue backlog $\sum_{i\in \mathcal{N}} \mathop{\mathrm{\mathbb{E}}}\nolimits[Q_i^{\pi}(t)]$.
  • Figure 4: Edge utilization $\frac{1}{t}\sum_{\tau=1}^t\space \mathop{\mathrm{\mathbb{E}}}\nolimits[\mu_{ij}^{\pi}(\tau)]/\mu_{ij}^{max}$.
  • Figure 5: Regret for $\lambda = 2$.
  • ...and 9 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Lemma 1
  • Corollary 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5: Drift Lemma neely_convex_opt