Timely Best Arm Identification in Restless Shared Networks

Mengqiu Zhou; Vincent Y. F. Tan; Meng Zhang

Timely Best Arm Identification in Restless Shared Networks

Mengqiu Zhou, Vincent Y. F. Tan, Meng Zhang

Abstract

Real-time status updating applications increasingly rely on networks of devices and edge nodes to maintain data freshness, as quantified by the age of information (AoI) metric. Given that edge computing nodes exhibit uncertain and time-varying dynamics, it is essential to identify the optimal edge node with high confidence and sample efficiency, even without prior knowledge of these dynamics, to ensure timely updates. To address this challenge, we introduce the first best arm identification (BAI) problem aimed at minimizing the long-term average AoI under a fixed confidence setting, framed within the context of a restless multi-armed bandit (RMAB) model. In this model, each arm evolves independently according to an unknown Markov chain over time, regardless of whether it is selected. To capture the temporal trajectories of AoI in the presence of unknown restless dynamics, we develop an age-aware LUCB algorithm that incorporates Markovian sampling. Additionally, we establish an instance-dependent upper bound on the sample complexity, which captures the difficulty of the problem as a function of the underlying Markov mixing behavior. Moreover, we derive an information-theoretic lower bound to characterize the fundamental challenges of the problem. We show that the sample complexity is influenced by the temporal correlation of the Markov dynamics, aligning with the intuition offered by the upper bound. Our numerical results show that, compared to existing benchmarks, the proposed scheme significantly reduces sampling costs, particularly under more stringent confidence levels.

Timely Best Arm Identification in Restless Shared Networks

Abstract

Paper Structure (29 sections, 8 theorems, 62 equations, 4 figures, 2 algorithms)

This paper contains 29 sections, 8 theorems, 62 equations, 4 figures, 2 algorithms.

Introduction
Background and Motivations
Contributions
Related Work
Age of Information
Restless Multi-Armed Bandit
System Model
Restless Edge Node Model
Time-Average Age
Problem Formulation
Best Arm Identification
Definition of Age-Optimal Best Arm
Best Arm Identification Policy
Age-Optimal BAI Algorithm
Markov Regeneration Sampling Strategy
...and 14 more sections

Key Result

Lemma 1

When $P$ represent an irreducible transition probability matrix on the finite state space $\mathcal{S}$ satisfying Assumption assump:P, for any $\theta_a$, the matrix $P_{\theta_a}$ is irreducible and positive recurrent.

Figures (4)

Figure 1: The system model. Each edge node is modeled as an arm in a restless multi-armed bandit framework, with unknown congestion dynamics.
Figure 2: Age of information $\Delta(t)$ evolution in time under zero-wait policy.
Figure 3: The structure of the Markov regeneration sampling strategy.
Figure 4: Numerical evaluations of the performance comparison with (a) the confidence level $\delta$, (b) the Markov mixing behavior, and (c) the instances.

Theorems & Definitions (10)

Lemma 1
Lemma 2
Definition 1: Pseudo Spectral Gap
Proposition 1
Lemma 3
Corollary 1
Theorem 1: Sample Complexity
Theorem 2: Lower Bound
Lemma 4
Remark 1

Timely Best Arm Identification in Restless Shared Networks

Abstract

Timely Best Arm Identification in Restless Shared Networks

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)