Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading
You Zhou, Qunsong Zeng, Kaibin Huang
TL;DR
A parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity, and substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.
Abstract
The edge artificial intelligence (AI) applications in next-generation mobile networks demand efficient AI-model downloading techniques to support real-time, on-device inference. However, transmitting high-dimensional AI models over wireless channels remains challenging due to limited communication resources. To address this issue, we propose a parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity. Empirical analysis reveals a highly right-skewed sensitivity distribution, indicating that only a small fraction of parameters significantly affect model performance. Leveraging this insight, we design a novel online retransmission protocol, i.e., the PASAR protocol, that adaptively terminates packet transmission based on real-time bit error rate (BER) measurements and the associated parametric sensitivity. The protocol employs an adaptive, round-wise stopping criterion, enabling heterogeneous, packet-level retransmissions that preserve overall model functionality but reduce overall latency. Extensive experiments across diverse deep neural network architectures and real-world datasets demonstrate that PASAR substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.
