AoI-based Scheduling of Correlated Sources for Timely Inference
Md Kamran Chowdhury Shisher, Vishrant Tripathi, Mung Chiang, Christopher G. Brinton
TL;DR
This work tackles timely inference from multiple correlated remote sources under limited communication by modeling AoI-based scheduling as a non-separable RMAB. It introduces an information-theoretic bound to replace joint penalties with per-source approximations, enabling a tractable gain-index policy (MGF) with performance guarantees for known penalties, and develops online learning (Online-MGF) to handle unknown penalties via bandit feedback. Theoretical results quantify approximation gaps and optimality bounds, while simulations demonstrate scalability and strong performance relative to baselines, even without full model knowledge. The approach offers a practical, scalable framework for timely inference in dense sensor networks and motivates extensions to signal-aware and distributed scheduling.
Abstract
We investigate a real-time remote inference system where multiple correlated sources transmit observations over a communication channel to a receiver. The receiver utilizes these observations to infer multiple time-varying targets. Due to limited communication resources, the delivered observations may not be fresh. To quantify data freshness, we employ the Age of Information (AoI) metric. To minimize the inference error, we aim to design a signal-agnostic scheduling policy that leverages AoI without requiring knowledge of the actual target values or the source observations. This scheduling problem is a restless multi-armed bandit (RMAB) problem with a non-separable penalty function. Unlike traditional RMABs, the correlation among sources introduces a unique challenge: the penalty function of each source depends on the AoI of other correlated sources, preventing the problem from decomposing into multiple independent Markov Decision Processes (MDPs), a key step in applying traditional RMAB solutions. To address this, we propose a novel approach that approximates the penalty function for each source and establishes an analytical bound on the approximation error. We then develop scheduling policies for two scenarios: (i) full knowledge of the penalty functions and (ii) no knowledge of the penalty functions. For the case of known penalty functions, we present an upper bound on the optimality gap that highlights the impact of the correlation parameter and the system size. For the case of unknown penalty functions and signal distributions, we develop an online learning approach that utilizes bandit feedback to learn an online Maximum Gain First policy. Simulation results demonstrate the effectiveness of our proposed policies in minimizing inference error and achieving scalability in the number of sources.
