Vision Transformer-based Semantic Communications With Importance-Aware Quantization
Joohyuk Park, Yongjeong Oh, Yongjune Kim, Yo-Seb Jeon
TL;DR
This work tackles training-free semantic communications for wireless image transmission by leveraging a pretrained Vision Transformer (ViT) to quantify patch importance via mean attention scores. It introduces importance-aware quantization (IAQ) that assigns per-patch bit depths by solving a weighted quantization error minimization, with two efficient solvers: an optimal incremental allocation and a low-complexity water-filling method. The framework is extended to realistic digital channels by modeling transmission as parallel binary symmetric channels (BSCs) and adjusting the distortion analysis accordingly. Experiments on single-view and multi-view datasets (e.g., CIFAR-100, MIRO, MVP_N) show that IAQ outperforms conventional quantization methods in both error-free and noisy digital settings, offering a scalable, training-free path for semantic communications in IoT and edge environments.
Abstract
Semantic communications provide significant performance gains over traditional communications by transmitting task-relevant semantic features through wireless channels. However, most existing studies rely on end-to-end (E2E) training of neural-type encoders and decoders to ensure effective transmission of these semantic features. To enable semantic communications without relying on E2E training, this paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission. The core idea of the presented system is to leverage the attention scores of a pretrained ViT model to quantify the importance levels of image patches. Based on this idea, our IAQ framework assigns different quantization bits to image patches based on their importance levels. This is achieved by formulating a weighted quantization error minimization problem, where the weight is set to be an increasing function of the attention score. Then, an optimal incremental allocation method and a low-complexity water-filling method are devised to solve the formulated problem. Our framework is further extended for realistic digital communication systems by modifying the bit allocation problem and the corresponding allocation methods based on an equivalent binary symmetric channel (BSC) model. Simulations on single-view and multi-view image classification tasks show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
