Table of Contents
Fetching ...

Online Sequential Decision-Making with Unknown Delays

Ping Wu, Heyan Huang, Zhengyang Liu

TL;DR

The paper addresses online sequential decision-making under unknown delays within the online convex optimization framework. It introduces three algorithm families—Follow the Delayed Regularized Leader (FTDRL), Delayed Mirror Descent (DMD), and Simplified Delayed Mirror Descent (SDMD)—that operate with full information, gradient, or gradient-value feedback and support universal norms via appropriate regularizers, while allowing approximate minimization. Theoretical results establish sublinear regret for general convexity and logarithmic regret under relative strong convexity across the three families, with bounds that explicitly depend on the total delay $D_T$, maximum delay $d$, and norm-dependent constants. The methods are demonstrated to be competitive with or superior to existing delayed-OCO approaches, and the framework supports practical scenarios where feedback types vary and exact solutions are computationally expensive. Overall, the work broadens delayed online learning to universal norms and feedback modalities, with clear implications for real-time decision-making under uncertain communications delays.

Abstract

In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.

Online Sequential Decision-Making with Unknown Delays

TL;DR

The paper addresses online sequential decision-making under unknown delays within the online convex optimization framework. It introduces three algorithm families—Follow the Delayed Regularized Leader (FTDRL), Delayed Mirror Descent (DMD), and Simplified Delayed Mirror Descent (SDMD)—that operate with full information, gradient, or gradient-value feedback and support universal norms via appropriate regularizers, while allowing approximate minimization. Theoretical results establish sublinear regret for general convexity and logarithmic regret under relative strong convexity across the three families, with bounds that explicitly depend on the total delay , maximum delay , and norm-dependent constants. The methods are demonstrated to be competitive with or superior to existing delayed-OCO approaches, and the framework supports practical scenarios where feedback types vary and exact solutions are computationally expensive. Overall, the work broadens delayed online learning to universal norms and feedback modalities, with clear implications for real-time decision-making under uncertain communications delays.

Abstract

In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.
Paper Structure (50 sections, 16 theorems, 135 equations, 4 figures, 6 algorithms)

This paper contains 50 sections, 16 theorems, 135 equations, 4 figures, 6 algorithms.

Key Result

Theorem 1

Under Assumptions assumption:RG and assumpton:strongly_convex, let the maximum approximate error $\rho_t=\frac{\eta G_\star^2}{8\sigma}, \forall t\in [T]$, Algorithm al:fl_gc satisfies

Figures (4)

  • Figure 1: Comparison with Baselines in Classification Task
  • Figure 2: Comparison with Baselines in Regression Task
  • Figure 3: Impact on Different Delayed Periods
  • Figure 4: Impact on Different Approximate Errors

Theorems & Definitions (28)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Example 2
  • Example 3
  • Theorem 1
  • Corollary 1
  • Remark
  • Theorem 2
  • ...and 18 more