Table of Contents
Fetching ...

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

You Zhou, Qunsong Zeng, Kaibin Huang

TL;DR

A parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity, and substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.

Abstract

The edge artificial intelligence (AI) applications in next-generation mobile networks demand efficient AI-model downloading techniques to support real-time, on-device inference. However, transmitting high-dimensional AI models over wireless channels remains challenging due to limited communication resources. To address this issue, we propose a parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity. Empirical analysis reveals a highly right-skewed sensitivity distribution, indicating that only a small fraction of parameters significantly affect model performance. Leveraging this insight, we design a novel online retransmission protocol, i.e., the PASAR protocol, that adaptively terminates packet transmission based on real-time bit error rate (BER) measurements and the associated parametric sensitivity. The protocol employs an adaptive, round-wise stopping criterion, enabling heterogeneous, packet-level retransmissions that preserve overall model functionality but reduce overall latency. Extensive experiments across diverse deep neural network architectures and real-world datasets demonstrate that PASAR substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

TL;DR

A parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity, and substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.

Abstract

The edge artificial intelligence (AI) applications in next-generation mobile networks demand efficient AI-model downloading techniques to support real-time, on-device inference. However, transmitting high-dimensional AI models over wireless channels remains challenging due to limited communication resources. To address this issue, we propose a parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity. Empirical analysis reveals a highly right-skewed sensitivity distribution, indicating that only a small fraction of parameters significantly affect model performance. Leveraging this insight, we design a novel online retransmission protocol, i.e., the PASAR protocol, that adaptively terminates packet transmission based on real-time bit error rate (BER) measurements and the associated parametric sensitivity. The protocol employs an adaptive, round-wise stopping criterion, enabling heterogeneous, packet-level retransmissions that preserve overall model functionality but reduce overall latency. Extensive experiments across diverse deep neural network architectures and real-world datasets demonstrate that PASAR substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.
Paper Structure (26 sections, 3 theorems, 24 equations, 9 figures, 2 algorithms)

This paper contains 26 sections, 3 theorems, 24 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Consider a model with $J$ packets, where the parameters in each packet are encoded as $n$-bit signed integers and transmitted over a wireless channel. Each packet experiences a potentially different BER, denoted by $P_{b,j}$. Accordingly, the expected sensitivity-aware downloading loss of the model where $\alpha=\frac{4^n-1}{6}$ denotes the constant term under a fixed $n$-bit quantization.

Figures (9)

  • Figure 1: The AI-model downloading system with retransmission.
  • Figure 2: Distribution of the parametric sensitivity in two DNN models. Each parametric sensitivity is assigned to a histogram bin, and the bin counts are normalized to form a probability density function. To obtain a smooth approximation of the distribution, every 20 consecutive bins are grouped, and the average bin center and corresponding average density are computed for each group. The red curve connects these averaged points to represent the underlying sensitivity distribution.
  • Figure 3: The effect of injecting BER = 0.1 into high- versus low-sensitivity parameter subsets (top-500 vs. bottom-500) on both model average loss and inference accuracy.
  • Figure 4: Online retransmission control of the PASAR protocol.
  • Figure 5: AI downloading latency versus SNR on MNIST using LeNet.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Lemma 1: Sensitivity-Aware Downloading Loss
  • proof
  • Lemma 2: Skewness Measure skewnessmedianmean
  • Remark 1: Skewness of Parametric Sensitivity
  • Lemma 3: Greedy Property of the Threshold Design
  • proof
  • Remark 2: Comparison with Channel-Aware Retransmission