Table of Contents
Fetching ...

Towards Dynamic and Small Objects Refinement for Unsupervised Domain Adaptative Nighttime Semantic Segmentation

Jingyi Pan, Sihang Li, Yucheng Chen, Jinjing Zhu, Lin Wang

TL;DR

This paper tackles nighttime semantic segmentation via unsupervised domain adaptation, addressing illumination-induced domain gaps and the poor transferability of dynamic and small objects. It introduces Dynamic and Small Object Refinement (DSR), which creates a mixed domain by image- and label-level mixup that emphasizes dynamic/small classes and leverages a long-tailed memory bank, and Feature Prototype Alignment (FPA), which uses cross-domain prototypes and contrastive losses with adaptive re-weighting to align source, mixed, and nighttime features. The approach achieves state-of-the-art results on Dark Zurich, Nighttime Driving, and ACDC-night, with clear gains for dynamic and small object categories such as poles, cars, and buses, while preserving all-day performance. Overall, the method offers a practical improvement for robust nighttime perception by focusing supervision on hard-to-transfer objects and reducing domain shifts through prototype-guided mixing and alignment.

Abstract

Nighttime semantic segmentation plays a crucial role in practical applications, such as autonomous driving, where it frequently encounters difficulties caused by inadequate illumination conditions and the absence of well-annotated datasets. Moreover, semantic segmentation models trained on daytime datasets often face difficulties in generalizing effectively to nighttime conditions. Unsupervised domain adaptation (UDA) has shown the potential to address the challenges and achieved remarkable results for nighttime semantic segmentation. However, existing methods still face limitations in 1) their reliance on style transfer or relighting models, which struggle to generalize to complex nighttime environments, and 2) their ignorance of dynamic and small objects like vehicles and poles, which are difficult to be directly learned from other domains. This paper proposes a novel UDA method that refines both label and feature levels for dynamic and small objects for nighttime semantic segmentation. First, we propose a dynamic and small object refinement module to complement the knowledge of dynamic and small objects from the source domain to target the nighttime domain. These dynamic and small objects are normally context-inconsistent in under-exposed conditions. Then, we design a feature prototype alignment module to reduce the domain gap by deploying contrastive learning between features and prototypes of the same class from different domains, while re-weighting the categories of dynamic and small objects. Extensive experiments on three benchmark datasets demonstrate that our method outperforms prior arts by a large margin for nighttime segmentation. Project page: https://rorisis.github.io/DSRNSS/.

Towards Dynamic and Small Objects Refinement for Unsupervised Domain Adaptative Nighttime Semantic Segmentation

TL;DR

This paper tackles nighttime semantic segmentation via unsupervised domain adaptation, addressing illumination-induced domain gaps and the poor transferability of dynamic and small objects. It introduces Dynamic and Small Object Refinement (DSR), which creates a mixed domain by image- and label-level mixup that emphasizes dynamic/small classes and leverages a long-tailed memory bank, and Feature Prototype Alignment (FPA), which uses cross-domain prototypes and contrastive losses with adaptive re-weighting to align source, mixed, and nighttime features. The approach achieves state-of-the-art results on Dark Zurich, Nighttime Driving, and ACDC-night, with clear gains for dynamic and small object categories such as poles, cars, and buses, while preserving all-day performance. Overall, the method offers a practical improvement for robust nighttime perception by focusing supervision on hard-to-transfer objects and reducing domain shifts through prototype-guided mixing and alignment.

Abstract

Nighttime semantic segmentation plays a crucial role in practical applications, such as autonomous driving, where it frequently encounters difficulties caused by inadequate illumination conditions and the absence of well-annotated datasets. Moreover, semantic segmentation models trained on daytime datasets often face difficulties in generalizing effectively to nighttime conditions. Unsupervised domain adaptation (UDA) has shown the potential to address the challenges and achieved remarkable results for nighttime semantic segmentation. However, existing methods still face limitations in 1) their reliance on style transfer or relighting models, which struggle to generalize to complex nighttime environments, and 2) their ignorance of dynamic and small objects like vehicles and poles, which are difficult to be directly learned from other domains. This paper proposes a novel UDA method that refines both label and feature levels for dynamic and small objects for nighttime semantic segmentation. First, we propose a dynamic and small object refinement module to complement the knowledge of dynamic and small objects from the source domain to target the nighttime domain. These dynamic and small objects are normally context-inconsistent in under-exposed conditions. Then, we design a feature prototype alignment module to reduce the domain gap by deploying contrastive learning between features and prototypes of the same class from different domains, while re-weighting the categories of dynamic and small objects. Extensive experiments on three benchmark datasets demonstrate that our method outperforms prior arts by a large margin for nighttime segmentation. Project page: https://rorisis.github.io/DSRNSS/.
Paper Structure (22 sections, 15 equations, 7 figures, 6 tables)

This paper contains 22 sections, 15 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Left: Our UDA method addresses the difficulty of transferring knowledge about dynamic and small objects from other domains for nighttime segmentation. Right: Compared with DANNet wu2021dannet, our method significantly improves the performance of the dynamic and small objects.
  • Figure 2: Overview of our proposed framework. $F_s$ and $F_t$ denote the student network and teacher network. 1) In the image-level mixup stage, we mix dynamic and small classes in source images into target nighttime images according to the ground truth of source images. 2) In the FPA module, we calculate prototypes ($\rho_m$, $\rho_s, \rho_n$) of each class to regularize the pixel embedding space ($f_s$, $f_m$, $f_n$) with contrastive learning. 3) In the label-level mixup stage, the refined target results obtained from the holistic refinement and source ground truth are mixed in the same classes as the image-level mixup into the mixed pseudo label for focusing the dynamic and small objects refinement.
  • Figure 3: An illustration of our DSR module. At the image level, two images are obtained from $S_d$ and $T_n$, respectively. We generate a composite mask from $Y_s$ in the source domain, including randomly selected, dynamic and small classes. This mask is then used to mix the two images while also mixing $Y_s$ and $y^{\prime}_n$. Furthermore, the long-tailed memory bank mechanism introduces a few occurrence regions into the mixed image and label.
  • Figure 4: Qualitative comparison between our approach and the state-of-the-art methods on the Dark Zurich-val set. Better viewed when zoomed in.
  • Figure 5: Qualitative comparison between our approach and some existing state-of-the-art methods on the Nighttime driving test set. Better viewed when zoomed in.
  • ...and 2 more figures