Online Sequential Decision-Making with Unknown Delays

Ping Wu; Heyan Huang; Zhengyang Liu

Online Sequential Decision-Making with Unknown Delays

Ping Wu, Heyan Huang, Zhengyang Liu

TL;DR

The paper addresses online sequential decision-making under unknown delays within the online convex optimization framework. It introduces three algorithm families—Follow the Delayed Regularized Leader (FTDRL), Delayed Mirror Descent (DMD), and Simplified Delayed Mirror Descent (SDMD)—that operate with full information, gradient, or gradient-value feedback and support universal norms via appropriate regularizers, while allowing approximate minimization. Theoretical results establish sublinear regret for general convexity and logarithmic regret under relative strong convexity across the three families, with bounds that explicitly depend on the total delay $D_T$, maximum delay $d$, and norm-dependent constants. The methods are demonstrated to be competitive with or superior to existing delayed-OCO approaches, and the framework supports practical scenarios where feedback types vary and exact solutions are computationally expensive. Overall, the work broadens delayed online learning to universal norms and feedback modalities, with clear implications for real-time decision-making under uncertain communications delays.

Abstract

In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.

Online Sequential Decision-Making with Unknown Delays

TL;DR

, maximum delay

, and norm-dependent constants. The methods are demonstrated to be competitive with or superior to existing delayed-OCO approaches, and the framework supports practical scenarios where feedback types vary and exact solutions are computationally expensive. Overall, the work broadens delayed online learning to universal norms and feedback modalities, with clear implications for real-time decision-making under uncertain communications delays.

Abstract

Paper Structure (50 sections, 16 theorems, 135 equations, 4 figures, 6 algorithms)

This paper contains 50 sections, 16 theorems, 135 equations, 4 figures, 6 algorithms.

Introduction
Organization.
Related Work
The Standard OCO
The Delayed OCO
Formal Notations
Follow the Delayed Regularized Leader
Sublinear Regret for General Convexity
Logarithmic Regret for Relative Strong Convexity
Delayed Mirror Descent
Sublinear Regret for General Convexity
Logarithmic Regret for Relative Strong Convexity
Simplified Delayed Mirror Descent
Sublinear Regret for General Convexity
Logarithmic Regret for Relative Strong Convexity
...and 35 more sections

Key Result

Theorem 1

Under Assumptions assumption:RG and assumpton:strongly_convex, let the maximum approximate error $\rho_t=\frac{\eta G_\star^2}{8\sigma}, \forall t\in [T]$, Algorithm al:fl_gc satisfies

Figures (4)

Figure 1: Comparison with Baselines in Classification Task
Figure 2: Comparison with Baselines in Regression Task
Figure 3: Impact on Different Delayed Periods
Figure 4: Impact on Different Approximate Errors

Theorems & Definitions (28)

Definition 1
Definition 2
Definition 3
Example 1
Example 2
Example 3
Theorem 1
Corollary 1
Remark
Theorem 2
...and 18 more

Online Sequential Decision-Making with Unknown Delays

TL;DR

Abstract

Online Sequential Decision-Making with Unknown Delays

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (28)