Table of Contents
Fetching ...

Energy-based Domain-Adaptive Segmentation with Depth Guidance

Jinjing Zhu, Zhedong Hu, Tae-Kyun Kim, Lin Wang

TL;DR

This work tackles domain shift in semantic segmentation when depth guidance is available but depth labels are unavailable in the target domain. It introduces SMART, an energy-based framework that learns task-adaptive semantic and depth features via Hopfield-energy-based discrepancy measurement (EB2F) and ensures reliable fusion through an energy-based assessment (RFA) that compares fusion-enabled and fusion-free predictions. By leveraging per-pixel energy scores and KL-based distillation, SMART robustly fuses depth guidance to improve segmentation across domains, outperforming prior methods on GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes. The results demonstrate the potential of energy-based learning for depth-guided domain adaptation and highlight the method’s plug-and-play applicability to multi-task learning in robotics contexts.

Abstract

Recent endeavors have been made to leverage self-supervised depth estimation as guidance in unsupervised domain adaptation (UDA) for semantic segmentation. Prior arts, however, overlook the discrepancy between semantic and depth features, as well as the reliability of feature fusion, thus leading to suboptimal segmentation performance. To address this issue, we propose a novel UDA framework called SMART (croSs doMain semAntic segmentation based on eneRgy esTimation) that utilizes Energy-Based Models (EBMs) to obtain task-adaptive features and achieve reliable feature fusion for semantic segmentation with self-supervised depth estimates. Our framework incorporates two novel components: energy-based feature fusion (EB2F) and energy-based reliable fusion Assessment (RFA) modules. The EB2F module produces task-adaptive semantic and depth features by explicitly measuring and reducing their discrepancy using Hopfield energy for better feature fusion. The RFA module evaluates the reliability of the feature fusion using an energy score to improve the effectiveness of depth guidance. Extensive experiments on two datasets demonstrate that our method achieves significant performance gains over prior works, validating the effectiveness of our energy-based learning approach.

Energy-based Domain-Adaptive Segmentation with Depth Guidance

TL;DR

This work tackles domain shift in semantic segmentation when depth guidance is available but depth labels are unavailable in the target domain. It introduces SMART, an energy-based framework that learns task-adaptive semantic and depth features via Hopfield-energy-based discrepancy measurement (EB2F) and ensures reliable fusion through an energy-based assessment (RFA) that compares fusion-enabled and fusion-free predictions. By leveraging per-pixel energy scores and KL-based distillation, SMART robustly fuses depth guidance to improve segmentation across domains, outperforming prior methods on GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes. The results demonstrate the potential of energy-based learning for depth-guided domain adaptation and highlight the method’s plug-and-play applicability to multi-task learning in robotics contexts.

Abstract

Recent endeavors have been made to leverage self-supervised depth estimation as guidance in unsupervised domain adaptation (UDA) for semantic segmentation. Prior arts, however, overlook the discrepancy between semantic and depth features, as well as the reliability of feature fusion, thus leading to suboptimal segmentation performance. To address this issue, we propose a novel UDA framework called SMART (croSs doMain semAntic segmentation based on eneRgy esTimation) that utilizes Energy-Based Models (EBMs) to obtain task-adaptive features and achieve reliable feature fusion for semantic segmentation with self-supervised depth estimates. Our framework incorporates two novel components: energy-based feature fusion (EB2F) and energy-based reliable fusion Assessment (RFA) modules. The EB2F module produces task-adaptive semantic and depth features by explicitly measuring and reducing their discrepancy using Hopfield energy for better feature fusion. The RFA module evaluates the reliability of the feature fusion using an energy score to improve the effectiveness of depth guidance. Extensive experiments on two datasets demonstrate that our method achieves significant performance gains over prior works, validating the effectiveness of our energy-based learning approach.
Paper Structure (12 sections, 19 equations, 7 figures, 4 tables)

This paper contains 12 sections, 19 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Energy-Based Feature Fusion (EB2F) is proposed to decrease the discrepancy between semantic and depth features, and Reliable Fusion Assessment (RFA) enables the fusion to facilitate segmentation.
  • Figure 2: Architecture of the proposed SMART framework, consisting of the shared encoder, task nets, and task decoders.
  • Figure 3: The illustration of the proposed EB2F module.
  • Figure 4: Illustration of the proposed RFA. $P_1$ and $\tilde{P}_1$ are predictions of same pixel without and with fusion module.
  • Figure 5: Qualitative results for GTA5-to-Cityscapes.
  • ...and 2 more figures