Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

You Zhou; Qunsong Zeng; Kaibin Huang

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

You Zhou, Qunsong Zeng, Kaibin Huang

TL;DR

A parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity, and substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.

Abstract

The edge artificial intelligence (AI) applications in next-generation mobile networks demand efficient AI-model downloading techniques to support real-time, on-device inference. However, transmitting high-dimensional AI models over wireless channels remains challenging due to limited communication resources. To address this issue, we propose a parametric-sensitivity-aware retransmission (PASAR) framework that manages radio-resource usage of different parameter packets according to their importance on model inference accuracy, known as parametric sensitivity. Empirical analysis reveals a highly right-skewed sensitivity distribution, indicating that only a small fraction of parameters significantly affect model performance. Leveraging this insight, we design a novel online retransmission protocol, i.e., the PASAR protocol, that adaptively terminates packet transmission based on real-time bit error rate (BER) measurements and the associated parametric sensitivity. The protocol employs an adaptive, round-wise stopping criterion, enabling heterogeneous, packet-level retransmissions that preserve overall model functionality but reduce overall latency. Extensive experiments across diverse deep neural network architectures and real-world datasets demonstrate that PASAR substantially outperforms classical hybrid automatic repeat request (HARQ) schemes in terms of communication efficiency and latency.

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

TL;DR

Abstract

Paper Structure (26 sections, 3 theorems, 24 equations, 9 figures, 2 algorithms)

This paper contains 26 sections, 3 theorems, 24 equations, 9 figures, 2 algorithms.

Introduction
System and Metrics
AI Downloading System
Packetization
Parameter Transmission
Performance Metrics
Downloading Loss
Downloading Latency
Sensitivity-Aware Downloading Loss
Downloading Loss Analysis
Sensitivity Skewness and Downloading Loss
Overview of PASAR Protocol
Retransmission Control Problem
MCKP Approach
PASAR Protocol
...and 11 more sections

Key Result

Lemma 1

Consider a model with $J$ packets, where the parameters in each packet are encoded as $n$-bit signed integers and transmitted over a wireless channel. Each packet experiences a potentially different BER, denoted by $P_{b,j}$. Accordingly, the expected sensitivity-aware downloading loss of the model where $\alpha=\frac{4^n-1}{6}$ denotes the constant term under a fixed $n$-bit quantization.

Figures (9)

Figure 1: The AI-model downloading system with retransmission.
Figure 2: Distribution of the parametric sensitivity in two DNN models. Each parametric sensitivity is assigned to a histogram bin, and the bin counts are normalized to form a probability density function. To obtain a smooth approximation of the distribution, every 20 consecutive bins are grouped, and the average bin center and corresponding average density are computed for each group. The red curve connects these averaged points to represent the underlying sensitivity distribution.
Figure 3: The effect of injecting BER = 0.1 into high- versus low-sensitivity parameter subsets (top-500 vs. bottom-500) on both model average loss and inference accuracy.
Figure 4: Online retransmission control of the PASAR protocol.
Figure 5: AI downloading latency versus SNR on MNIST using LeNet.
...and 4 more figures

Theorems & Definitions (7)

Lemma 1: Sensitivity-Aware Downloading Loss
proof
Lemma 2: Skewness Measure skewnessmedianmean
Remark 1: Skewness of Parametric Sensitivity
Lemma 3: Greedy Property of the Threshold Design
proof
Remark 2: Comparison with Channel-Aware Retransmission

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

TL;DR

Abstract

Parametric-Sensitivity Aware Retransmission for Efficient AI Downloading

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)