Table of Contents
Fetching ...

Efficient Semantic Communication Through Transformer-Aided Compression

Matin Mortaheb, Mohammad A. Amir Khojastepour, Sennur Ulukus

TL;DR

This paper tackles efficient semantic transmission over time-varying wireless channels by introducing a patch-based, multi-resolution encoding scheme guided by Vision Transformer attention. It formalizes goal-oriented semantics with distortion metrics and proposes a two-step rate-partitioning approach, where an attention-guided selector assigns per-patch encoding levels under a channel rate $r$, followed by end-to-end training of per-resolution encoders/decoders. The main contributions are (i) a novel multi-resolution encoding framework that leverages ViT attention to identify semantically important patches, (ii) an attention-aggregation mechanism to produce a per-patch resolution map constrained by the available bitrate, and (iii) demonstrations on TinyImageNet showing improved semantic fidelity and accuracy under bandwidth limitations. The results imply practical gains for transmitting multi-resolution visual data in constrained bandwidth scenarios, with potential impact on holographic, haptic, and other goal-oriented communication applications.

Abstract

Transformers, known for their attention mechanisms, have proven highly effective in focusing on critical elements within complex data. This feature can effectively be used to address the time-varying channels in wireless communication systems. In this work, we introduce a channel-aware adaptive framework for semantic communication, where different regions of the image are encoded and compressed based on their semantic content. By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches and dynamically categorize the patches to be compressed at various rates as a function of the instantaneous channel bandwidth. Our method enhances communication efficiency by adapting the encoding resolution to the content's relevance, ensuring that even in highly constrained environments, critical information is preserved. We evaluate the proposed adaptive transmission framework using the TinyImageNet dataset, measuring both reconstruction quality and accuracy. The results demonstrate that our approach maintains high semantic fidelity while optimizing bandwidth, providing an effective solution for transmitting multi-resolution data in limited bandwidth conditions.

Efficient Semantic Communication Through Transformer-Aided Compression

TL;DR

This paper tackles efficient semantic transmission over time-varying wireless channels by introducing a patch-based, multi-resolution encoding scheme guided by Vision Transformer attention. It formalizes goal-oriented semantics with distortion metrics and proposes a two-step rate-partitioning approach, where an attention-guided selector assigns per-patch encoding levels under a channel rate , followed by end-to-end training of per-resolution encoders/decoders. The main contributions are (i) a novel multi-resolution encoding framework that leverages ViT attention to identify semantically important patches, (ii) an attention-aggregation mechanism to produce a per-patch resolution map constrained by the available bitrate, and (iii) demonstrations on TinyImageNet showing improved semantic fidelity and accuracy under bandwidth limitations. The results imply practical gains for transmitting multi-resolution visual data in constrained bandwidth scenarios, with potential impact on holographic, haptic, and other goal-oriented communication applications.

Abstract

Transformers, known for their attention mechanisms, have proven highly effective in focusing on critical elements within complex data. This feature can effectively be used to address the time-varying channels in wireless communication systems. In this work, we introduce a channel-aware adaptive framework for semantic communication, where different regions of the image are encoded and compressed based on their semantic content. By employing vision transformers, we interpret the attention mask as a measure of the semantic contents of the patches and dynamically categorize the patches to be compressed at various rates as a function of the instantaneous channel bandwidth. Our method enhances communication efficiency by adapting the encoding resolution to the content's relevance, ensuring that even in highly constrained environments, critical information is preserved. We evaluate the proposed adaptive transmission framework using the TinyImageNet dataset, measuring both reconstruction quality and accuracy. The results demonstrate that our approach maintains high semantic fidelity while optimizing bandwidth, providing an effective solution for transmitting multi-resolution data in limited bandwidth conditions.

Paper Structure

This paper contains 9 sections, 4 equations, 8 figures, 2 algorithms.

Figures (8)

  • Figure 1: Channel-aware multi-resolution semantic communication framework.
  • Figure 2: Attention-guided resolution selector block framework.
  • Figure 3: Attention aggregation block.
  • Figure 4: Encoder and Decoder structure for different resolutions.
  • Figure 5: Reconstruction result for three medium resolutions.
  • ...and 3 more figures