Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding

Danish Nazir; Timo Bartels; Jan Piewek; Thorsten Bagdonat; Tim Fingscheidt

Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding

Danish Nazir, Timo Bartels, Jan Piewek, Thorsten Bagdonat, Tim Fingscheidt

TL;DR

This work tackles distributed semantic segmentation by reducing cloud-side computation through joint source and task decoding (JD), which fuses the bottleneck feature decoding and the segmentation task into a single network. It builds a variational bottleneck with a hyperprior to compress edge-generated features and introduces JD, which directly maps quantized latent streams to segmentation masks while preserving end-to-end optimization. Training-time over-parameterization is applied to the JD ASPP block to boost performance at high bitrates without increasing inference complexity. The method achieves state-of-the-art rate-distortion performance across COCO and Cityscapes while using only about 9.8% to 11.59% of prior cloud parameters, enabling scalable, privacy-conscious edge-cloud deployments for semantic segmentation.

Abstract

Distributed computing in the context of deep neural networks (DNNs) implies the execution of one part of the network on edge devices and the other part typically on a large-scale cloud platform. Conventional methods propose to employ a serial concatenation of a learned image and source encoder, the latter projecting the image encoder output (bottleneck features) into a quantized representation for bitrate-efficient transmission. In the cloud, a respective source decoder reprojects the quantized representation to the original feature representation, serving as an input for the downstream task decoder performing, e.g., semantic segmentation. In this work, we propose joint source and task decoding, as it allows for a smaller network size in the cloud. This further enables the scalability of such services in large numbers without requiring extensive computational load on the cloud per channel. We demonstrate the effectiveness of our method by achieving a distributed semantic segmentation SOTA over a wide range of bitrates on the mean intersection over union metric, while using only $9.8 \%$ ... $11.59 \%$ of cloud DNN parameters used in the previous SOTA on the COCO and Cityscapes datasets.

Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding

TL;DR

Abstract

...

of cloud DNN parameters used in the previous SOTA on the COCO and Cityscapes datasets.

Paper Structure (20 sections, 3 equations, 8 figures, 2 tables)

This paper contains 20 sections, 3 equations, 8 figures, 2 tables.

Introduction
Related Works
Semantic Segmentation
Distributed Semantic Segmentation
General:
Low-complexity encoder-decoder setup:
Method
Bottleneck Feature Compression with Variational Models
Proposed Joint Source and Task Decoder ($\bf{JD}$)
Training-Time Over-Parameterization
Experimental Overview
Datasets
Experimental Design, Training and Metrics
Results and Discussion
Comparison With SOTA Methods
...and 5 more sections

Figures (8)

Figure 1: High-level comparison of our proposed approach with existing SOTA approaches in distributed semantic segmentation. Here, $\bf{SE}$ and $\bf{SD}$ represent the source encoder and decoder, respectively. Blocks $\bf{E}$ and $\bf{D}$ are the image encoder and task decoder, respectively. Further, $\bf{JSDE}$ is the joint source decoder and image encoder, while $\bf{JD}$ represents the proposed joint source and task decoder, in short: joint decoder.
Figure 2: Hyperprior architecture of source encoder $\bf{SE}$ and source decoder $\bf{SD}$, see ahuja2023neuralballe2018variational.
Figure 3: Proposed architecture of the joint source and task decoder ($\mathbf{JD}$, see Fig. \ref{['fig:compressed']}). Training details of the blue convolutional blocks within the ASPP block are shown in Figure \ref{['fig:overparam']}.
Figure 4: Proposed over-parameterization of the $\bf{JD}$ ASPP subblocks in Figure \ref{['fig:CD']}.
Figure 5: Proposed $\bf{JD}$ approach ("Ours") against SOTA approaches on the mIoU metric for (a) $\mathcal{D}_{\mathrm{COCO}}^{\mathrm{val2017}}$ and (b) $\mathcal{D}_{\mathrm{CS}}^{\mathrm{val}}$ datasets. The values denoted by $^{\textcolor{red}{*}}$ are taken from respective papers and the identifiers in parentheses (1x) refer to the type of approach in Figure \ref{['fig:conventional']} ... \ref{['fig:compressed']}. On both COCO and Cityscapes datasets, our proposed approach "Ours" achieves better RD trade-off than SOTA baselines at a wide range of bitrates. Note that our proposed approach "Ours" uses $K=1$ (COCO) and $K=3$ (Cityscapes) in the ASPP block.
...and 3 more figures

Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding

TL;DR

Abstract

Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (8)