Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation

Patrick Schmidt; Vasileios Belagiannis; Lazaros Nalpantidis

Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation

Patrick Schmidt, Vasileios Belagiannis, Lazaros Nalpantidis

TL;DR

Depth Edge Alignment Loss (DEAL) addresses the challenge of pixel-level labeling in weakly supervised semantic segmentation by leveraging depth information to align CAM boundaries with depth edges. The method defines a depth-edge alignment loss, $\mathcal{L}_{\mathrm{edge}}$, computed from Sobel-derived CAM and depth edge activations $a'$ and $d'$, where $a' = \tanh(\mu + \log\left(\frac{a}{1-a}\right))$, $d' = \tanh(\mu + \log\left(\frac{d}{1-d}\right))$ and $\mathcal{L}_{\mathrm{edge}} = -\frac{1}{HW}\sum_{ij}\frac{1}{\sum_k y_k}\sum_k y_k a'_{k,ij} d'_{ij}$ with $\mu=2.5$. Incorporating DEAL on top of CAM-based WSSS (e.g., WeakTr and SEAM) and optional ISL/FSL yields consistent mIoU improvements across VOC, COCO, and HOPE, including robustness to depth noise. The framework is model-agnostic and accommodates noisy real-world depth data, highlighting practical benefits for robotic perception and suggesting avenues for integrating depth with vision-language models in future work.

Abstract

Autonomous robotic systems applied to new domains require an abundance of expensive, pixel-level dense labels to train robust semantic segmentation models under full supervision. This study proposes a model-agnostic Depth Edge Alignment Loss to improve Weakly Supervised Semantic Segmentation models across different datasets. The methodology generates pixel-level semantic labels from image-level supervision, avoiding expensive annotation processes. While weak supervision is widely explored in traditional computer vision, our approach adds supervision with pixel-level depth information, a modality commonly available in robotic systems. We demonstrate how our approach improves segmentation performance across datasets and models, but can also be combined with other losses for even better performance, with improvements up to +5.439, +1.274 and +16.416 points in mean Intersection over Union on the PASCAL VOC / MS COCO validation, and the HOPE static onboarding split, respectively. Our code is made publicly available at https://github.com/DTU-PAS/DEAL.

Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation

TL;DR

Abstract

Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)