Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

Jierui Zhang; Jianhao Huang; Kaibin Huang

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

Jierui Zhang, Jianhao Huang, Kaibin Huang

TL;DR

This work addresses the challenge of developing a tractable analytical model for E2E inference accuracy and leveraging it to design a channel-adaptive AI algorithm that maximizes inference throughput, referred to as the edge processing rate (EPR), under latency and accuracy constraints.

Abstract

\emph{Integrated communication and computation} (IC$^2$) has emerged as a new paradigm for enabling efficient edge inference in sixth-generation (6G) networks. However, the design of IC$^2$ technologies is hindered by the lack of a tractable theoretical framework for characterizing \emph{end-to-end} (E2E) inference performance. The metric is highly complicated as it needs to account for both channel distortion and artificial intelligence (AI) model architecture and computational complexity. In this work, we address this challenge by developing a tractable analytical model for E2E inference accuracy and leveraging it to design a \emph{channel-adaptive AI} algorithm that maximizes inference throughput, referred to as the edge processing rate (EPR), under latency and accuracy constraints. Specifically, we consider an edge inference system in which a server deploys a backbone model with early exit, which enables flexible computational complexity, to perform inference on data features transmitted by a mobile device. The proposed accuracy model characterizes high-dimensional feature distributions in the angular domain using a Mixture of von Mises (MvM) distribution. This leads to a desired closed-form expression for inference accuracy as a function of quantization bit-width and model traversal depth, which represents channel distortion and computational complexity, respectively. Building upon this accuracy model, we formulate and solve the EPR maximization problem under joint latency and accuracy constraints, leading to a channel-adaptive AI algorithm that achieves full IC$^2$ integration. The proposed algorithm jointly adapts transmit-side feature compression and receive-side model complexity according to channel conditions to maximize overall efficiency and inference throughput. Experimental results demonstrate its superior performance as compared with fixed-complexity counterparts.

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

TL;DR

Abstract

\emph{Integrated communication and computation} (IC

) has emerged as a new paradigm for enabling efficient edge inference in sixth-generation (6G) networks. However, the design of IC

technologies is hindered by the lack of a tractable theoretical framework for characterizing \emph{end-to-end} (E2E) inference performance. The metric is highly complicated as it needs to account for both channel distortion and artificial intelligence (AI) model architecture and computational complexity. In this work, we address this challenge by developing a tractable analytical model for E2E inference accuracy and leveraging it to design a \emph{channel-adaptive AI} algorithm that maximizes inference throughput, referred to as the edge processing rate (EPR), under latency and accuracy constraints. Specifically, we consider an edge inference system in which a server deploys a backbone model with early exit, which enables flexible computational complexity, to perform inference on data features transmitted by a mobile device. The proposed accuracy model characterizes high-dimensional feature distributions in the angular domain using a Mixture of von Mises (MvM) distribution. This leads to a desired closed-form expression for inference accuracy as a function of quantization bit-width and model traversal depth, which represents channel distortion and computational complexity, respectively. Building upon this accuracy model, we formulate and solve the EPR maximization problem under joint latency and accuracy constraints, leading to a channel-adaptive AI algorithm that achieves full IC

integration. The proposed algorithm jointly adapts transmit-side feature compression and receive-side model complexity according to channel conditions to maximize overall efficiency and inference throughput. Experimental results demonstrate its superior performance as compared with fixed-complexity counterparts.

Paper Structure (41 sections, 6 theorems, 36 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 41 sections, 6 theorems, 36 equations, 14 figures, 2 tables, 1 algorithm.

Introduction
Models and Metrics
Inference Model
Adaptive Exit
Intermediate Classifiers
Backbone and Classifier Training
Communication Model
Performance Metrics
Tractable Modeling of Inference Accuracy
Modeling the Distribution of Angular Features
Model Parameter w.r.t. Traversal Depth and Channel Distortion
Modeling Distortion w.r.t. Bit-width
$\kappa_{\Delta,\ell}$ w.r.t. Traversal Depth
$\kappa_{\Delta,\ell}$ w.r.t. Traversal Depth and Channel Distortion
Inference Accuracy Analysis
...and 26 more sections

Key Result

Proposition 1

Assume a data sample to transmit belongs to $j$, given the transversal depth, $\ell$, at the server, and quantization distortion with variance $\sigma^2_\Delta$, the channel-distorted angular feature, $\tilde{\theta}_\ell|j$, follows a vM distribution with centroid $\mu_j$ and concentration paramete where $A(\cdot)\stackrel{\triangle}=\frac{I_1(\cdot)}{I_0(\cdot)}$, $A^{-1}(\cdot)$ is its inversio

Figures (14)

Figure 1: The relationship between inference accuracy, traversal depth, and quantization bit-width. Two settings of bit-width are utilized. The model used is ResNet-152 with intermediate classifiers, and on the dataset CIFAR-10.
Figure 2: The system model of the channel-adaptive edge inference. The bit-width $q$ and traversal depth $\ell$ are controlled w.r.t. SNR.
Figure 3: Comparison between the standard Softmax classifier (trained with cross entropy) and the proposed angular classifier.
Figure 4: The distributions of the 2D and angular features. In the angular domain, the class centroids are distributed evenly and the concentration level across different classes are similar.
Figure 5: $\kappa_{\Delta,\ell}$ w.r.t. traversal depth and bit-width.
...and 9 more figures

Theorems & Definitions (15)

Definition 1: Edge Processing Rate (EPR)
Remark 1: Comparison with traditional metrics
Proposition 1
proof
Remark 2: Effects of channel distortion
Theorem 1
proof
Corollary 1: Monotonicity of Accuracy Function
proof
Corollary 2
...and 5 more

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

TL;DR

Abstract

Channel-Adaptive Edge AI: Maximizing Inference Throughput by Adapting Computational Complexity to Channel States

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (15)