Adaptive Semantic Token Selection for AI-native Goal-oriented Communications
Alessio Devoto, Simone Petruzzi, Jary Pomponi, Paolo Di Lorenzo, Simone Scardapane
TL;DR
This work addresses dynamic bandwidth and computation constraints in AI-native goal-oriented communications by combining a transformer-based deep JSCC pipeline with a trainable, per-input token selection mechanism. A budget token, threshold gates, and per-layer halting scores enable adaptive token dropping under a budget $\\alpha \in [0,1]$, optimized via a penalty on $(T(x) - \alpha T)^2$ and trained across random budgets. The approach yields a single model that maintains high task accuracy across a range of latency and bandwidth constraints and provides interpretable token-discard masks that reflect semantic content. Empirical results on Imagenette with a DeiT backbone show robust performance under noisy channels and clear interpretability of the token-selection process, highlighting practical benefits for flexible AI-native communication systems.
Abstract
In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation. Transformers have become the standard architecture for pretraining large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable latency and bandwidth constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multihead attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and bandwidth constraints, without the need for deploying multiple architectures tailored to each constraint. Last, but not least, the proposed token selection mechanism helps extract powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.
