Table of Contents
Fetching ...

Importance-Aware Image Segmentation-based Semantic Communication for Autonomous Driving

Jie Lv, Haonan Tong, Qiang Pan, Zhilong Zhang, Xinxin He, Tao Luo, Changchuan Yin

TL;DR

This paper tackles semantic communication for autonomous driving by proposing VIS-SemCom, a Swin Transformer–based semantic codec that transmits multi-scale segmentation features rather than raw images. It emphasizes important objects through an importance-aware loss and an online hard sample mining strategy, enabling robust segmentation under constrained V2X channels. Key contributions include a multi-scale feature extractor with differentiated STB allocations, a decoder/reconstructor for semantic maps, and a training framework combining weighted cross-entropy and IoU losses. Experiments on Cityscapes show a coding gain of about $6$ dB and up to $70\%$ data reduction at $60\%$ mIoU, plus improved IoU for important objects, highlighting practical benefits for safe autonomous driving in bandwidth-limited V2X scenarios.

Abstract

This article studies the problem of image segmentation-based semantic communication in autonomous driving. In real traffic scenes, detecting the key objects (e.g., vehicles, pedestrians and obstacles) is more crucial than that of other objects to guarantee driving safety. Therefore, we propose a vehicular image segmentation-oriented semantic communication system, termed VIS-SemCom, where image segmentation features of important objects are transmitted to reduce transmission redundancy. First, to accurately extract image semantics, we develop a semantic codec based on Swin Transformer architecture, which expands the perceptual field thus improving the segmentation accuracy. Next, we propose a multi-scale semantic extraction scheme via assigning the number of Swin Transformer blocks for diverse resolution features, thus highlighting the important objects' accuracy. Furthermore, the importance-aware loss is invoked to emphasize the important objects, and an online hard sample mining (OHEM) strategy is proposed to handle small sample issues in the dataset. Experimental results demonstrate that the proposed VIS-SemCom can achieve a coding gain of nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the transmitted data amount by up to 70% with a 60% mIoU, and improve the segmentation intersection over union (IoU) of important objects by 4%, compared to traditional transmission scheme.

Importance-Aware Image Segmentation-based Semantic Communication for Autonomous Driving

TL;DR

This paper tackles semantic communication for autonomous driving by proposing VIS-SemCom, a Swin Transformer–based semantic codec that transmits multi-scale segmentation features rather than raw images. It emphasizes important objects through an importance-aware loss and an online hard sample mining strategy, enabling robust segmentation under constrained V2X channels. Key contributions include a multi-scale feature extractor with differentiated STB allocations, a decoder/reconstructor for semantic maps, and a training framework combining weighted cross-entropy and IoU losses. Experiments on Cityscapes show a coding gain of about dB and up to data reduction at mIoU, plus improved IoU for important objects, highlighting practical benefits for safe autonomous driving in bandwidth-limited V2X scenarios.

Abstract

This article studies the problem of image segmentation-based semantic communication in autonomous driving. In real traffic scenes, detecting the key objects (e.g., vehicles, pedestrians and obstacles) is more crucial than that of other objects to guarantee driving safety. Therefore, we propose a vehicular image segmentation-oriented semantic communication system, termed VIS-SemCom, where image segmentation features of important objects are transmitted to reduce transmission redundancy. First, to accurately extract image semantics, we develop a semantic codec based on Swin Transformer architecture, which expands the perceptual field thus improving the segmentation accuracy. Next, we propose a multi-scale semantic extraction scheme via assigning the number of Swin Transformer blocks for diverse resolution features, thus highlighting the important objects' accuracy. Furthermore, the importance-aware loss is invoked to emphasize the important objects, and an online hard sample mining (OHEM) strategy is proposed to handle small sample issues in the dataset. Experimental results demonstrate that the proposed VIS-SemCom can achieve a coding gain of nearly 6 dB with a 60% mean intersection over union (mIoU), reduce the transmitted data amount by up to 70% with a 60% mIoU, and improve the segmentation intersection over union (IoU) of important objects by 4%, compared to traditional transmission scheme.
Paper Structure (16 sections, 12 equations, 8 figures, 3 tables)

This paper contains 16 sections, 12 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The VIS-SemCom system model.
  • Figure 2: The framework of the proposed VIS-SemCom system consisting of an encoder and a decoder.
  • Figure 3: (a) The architecture of two consecutive STBs; (b) Illustration of the window partition and shifted window partition (e.g., H = 8, W = 8, and M = 4).
  • Figure 4: The visualization results of image segmentation by the traditional scheme and the VIS-SemCom system with vehicle velocity 120 km/h and the compression ratio $R = 48$ (SNR$_{{test}}$ = 19 dB, 22 dB).
  • Figure 5: The mIoU performance of the proposed VIS-SemCom scheme and traditional schemes with different sets. The vehicle velocity is 50 km/h.
  • ...and 3 more figures