Table of Contents
Fetching ...

Adaptive Semantic Token Communication for Transformer-based Edge Inference

Alessio Devoto, Jary Pomponi, Mattia Merluzzi, Paolo Di Lorenzo, Simone Scardapane

TL;DR

This work tackles semantic, goal-oriented edge inference under dynamic bandwidth and channel conditions by introducing a transformer-based DJSCC framework that selectively transmits semantic tokens. It combines adaptive token selection with complex-valued token representations and per-token compression, enabling a single robust model across varying channel qualities. A Lyapunov stochastic optimization-based controller dynamically tunes the token budget $\alpha$ and per-token dimensionality $r$ to balance inference accuracy and communication cost in time-varying networks, supported by an interpretable token-discard mechanism. Experimental results on Imagenette show consistent gains over baselines in accuracy at fixed bandwidth, enhanced robustness to noise, and clear interpretability of the token selection process, highlighting practical potential for AI-native semantic communication in edge intelligence.

Abstract

This paper presents an adaptive framework for edge inference based on a dynamically configurable transformer-powered deep joint source channel coding (DJSCC) architecture. Motivated by a practical scenario where a resource constrained edge device engages in goal oriented semantic communication, such as selectively transmitting essential features for object detection to an edge server, our approach enables efficient task aware data transmission under varying bandwidth and channel conditions. To achieve this, input data is tokenized into compact high level semantic representations, refined by a transformer, and transmitted over noisy wireless channels. As part of the DJSCC pipeline, we employ a semantic token selection mechanism that adaptively compresses informative features into a user specified number of tokens per sample. These tokens are then further compressed through the JSCC module, enabling a flexible token communication strategy that adjusts both the number of transmitted tokens and their embedding dimensions. We incorporate a resource allocation algorithm based on Lyapunov stochastic optimization to enhance robustness under dynamic network conditions, effectively balancing compression efficiency and task performance. Experimental results demonstrate that our system consistently outperforms existing baselines, highlighting its potential as a strong foundation for AI native semantic communication in edge intelligence applications.

Adaptive Semantic Token Communication for Transformer-based Edge Inference

TL;DR

This work tackles semantic, goal-oriented edge inference under dynamic bandwidth and channel conditions by introducing a transformer-based DJSCC framework that selectively transmits semantic tokens. It combines adaptive token selection with complex-valued token representations and per-token compression, enabling a single robust model across varying channel qualities. A Lyapunov stochastic optimization-based controller dynamically tunes the token budget and per-token dimensionality to balance inference accuracy and communication cost in time-varying networks, supported by an interpretable token-discard mechanism. Experimental results on Imagenette show consistent gains over baselines in accuracy at fixed bandwidth, enhanced robustness to noise, and clear interpretability of the token selection process, highlighting practical potential for AI-native semantic communication in edge intelligence.

Abstract

This paper presents an adaptive framework for edge inference based on a dynamically configurable transformer-powered deep joint source channel coding (DJSCC) architecture. Motivated by a practical scenario where a resource constrained edge device engages in goal oriented semantic communication, such as selectively transmitting essential features for object detection to an edge server, our approach enables efficient task aware data transmission under varying bandwidth and channel conditions. To achieve this, input data is tokenized into compact high level semantic representations, refined by a transformer, and transmitted over noisy wireless channels. As part of the DJSCC pipeline, we employ a semantic token selection mechanism that adaptively compresses informative features into a user specified number of tokens per sample. These tokens are then further compressed through the JSCC module, enabling a flexible token communication strategy that adjusts both the number of transmitted tokens and their embedding dimensions. We incorporate a resource allocation algorithm based on Lyapunov stochastic optimization to enhance robustness under dynamic network conditions, effectively balancing compression efficiency and task performance. Experimental results demonstrate that our system consistently outperforms existing baselines, highlighting its potential as a strong foundation for AI native semantic communication in edge intelligence applications.

Paper Structure

This paper contains 15 sections, 22 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: System scenario for semantic-oriented edge inference.
  • Figure 2: Schema of the proposed token-based DJSCC.
  • Figure 3: The details of two components of our proposed token discarding approach. On the left, the creation of the budget token, while on the right, how the tokens are actively discarded. More detail in Section \ref{['sec:method']}.
  • Figure 4: As the images flow through the model (from left to right), some tokens, highlighted in red, are discarded.
  • Figure 5: Overview of the proposed token-based transmission pipeline. The features of a generic token are processed by the endcoder $\mathcal{C}^r_E$, having a compression ratio of $r=0.5$, to produce the complex symbols, which are transmitted through a channel. The decoder $\mathcal{C}^r_D$ receives the symbols, concatenates imaginary and real parts, and recreates the features.
  • ...and 4 more figures