Transmit What You Need: Task-Adaptive Semantic Communications for Visual Information
Jeonghun Park, Sung Whan Yoon
TL;DR
This work tackles bandwidth-limited wireless transmission of visual data by proposing task-adaptive semantic communications for computer vision. It introduces a three-part framework: a semantic extractor to derive rich semantics (objects, layouts, relations, semantics maps, scene graphs, and feature maps), a semantic filtering stage that prunes redundant graph information, and a task-adaptive semantic selection module that transmits only semantics required by the target CV task. Using diffusion-based decoders to reconstruct images from the transmitted semantics and extensive experiments across Visual Genome, COCO, Cityscapes, Open Image V6, and ImageNet-1K, the study demonstrates substantial throughput gains (e.g., >45x) with minimal degradation in task performance, and shows real-time viability in 5G-like channels. The key contributions include a novel SG filtering algorithm guided by conditional probabilities and language-model redundancy assessment, a systematic analysis of task-specific semantic needs, and robust validation across simple to complex tasks such as classification, segmentation, image retrieval, and image generation. The results establish practical guidelines for selecting semantics by task and reveal the potential of task-adaptive semantic communications to enable real-time CV services with dramatically reduced data rates and latency, supported by open-source code.
Abstract
Recently, semantic communications have drawn great attention as the groundbreaking concept surpasses the limited capacity of Shannon's theory. Specifically, semantic communications probably become crucial in realizing visual tasks that demand massive network traffic. Although highly distinctive forms of visual semantics exist for computer vision tasks, a thorough investigation of what visual semantics can be transmitted in time and which one is required for completing different visual tasks has not yet been reported. To this end, we first scrutinize the achievable throughput in transmitting existing visual semantics through the limited wireless communication bandwidth. In addition, we further demonstrate the resulting performance of various visual tasks for each visual semantic. Based on the empirical testing, we suggest a task-adaptive selection of visual semantics is crucial for real-time semantic communications for visual tasks, where we transmit basic semantics (e.g., objects in the given image) for simple visual tasks, such as classification, and richer semantics (e.g., scene graphs) for complex tasks, such as image regeneration. To further improve transmission efficiency, we suggest a filtering method for scene graphs, which drops redundant information in the scene graph, thus allowing the sending of essential semantics for completing the given task. We confirm the efficacy of our task-adaptive semantic communication approach through extensive simulations in wireless channels, showing more than 45 times larger throughput over a naive transmission of original data. Our work can be reproduced at the following source codes: https://github.com/jhpark2024/jhpark.github.io
