Table of Contents
Fetching ...

Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation

Yuwen Pan, Rui Sun, Naisong Luo, Tianzhu Zhang, Yongdong Zhang

TL;DR

This paper tackles night-time semantic segmentation by rejecting the practice of forcing night images into day distributions. It introduces NightFormer, a two-branch architecture with a pixel-level texture enhancement module that leverages Fourier phase information and a hierarchical amplified decoder, plus an object-level reliable matching module that uses learnable prototypes and reliable attention to bridge prototypes and pixels. The method achieves state-of-the-art performance on NightCity, NightCity-fine, CityScapes, and BDD100K-night, demonstrating strong improvements in areas with degraded texture and low contrast. The work advances end-to-end night-specific perception, with practical implications for autonomous driving and night-vision systems, by reducing texture loss and mis-segmentation caused by deceptive low-light cues.

Abstract

Semantic segmentation of night-time images holds significant importance in computer vision, particularly for applications like night environment perception in autonomous driving systems. However, existing methods tend to parse night-time images from a day-time perspective, leaving the inherent challenges in low-light conditions (such as compromised texture and deceiving matching errors) unexplored. To address these issues, we propose a novel end-to-end optimized approach, named NightFormer, tailored for night-time semantic segmentation, avoiding the conventional practice of forcibly fitting night-time images into day-time distributions. Specifically, we design a pixel-level texture enhancement module to acquire texture-aware features hierarchically with phase enhancement and amplified attention, and an object-level reliable matching module to realize accurate association matching via reliable attention in low-light environments. Extensive experimental results on various challenging benchmarks including NightCity, BDD and Cityscapes demonstrate that our proposed method performs favorably against state-of-the-art night-time semantic segmentation methods.

Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation

TL;DR

This paper tackles night-time semantic segmentation by rejecting the practice of forcing night images into day distributions. It introduces NightFormer, a two-branch architecture with a pixel-level texture enhancement module that leverages Fourier phase information and a hierarchical amplified decoder, plus an object-level reliable matching module that uses learnable prototypes and reliable attention to bridge prototypes and pixels. The method achieves state-of-the-art performance on NightCity, NightCity-fine, CityScapes, and BDD100K-night, demonstrating strong improvements in areas with degraded texture and low contrast. The work advances end-to-end night-specific perception, with practical implications for autonomous driving and night-vision systems, by reducing texture loss and mis-segmentation caused by deceptive low-light cues.

Abstract

Semantic segmentation of night-time images holds significant importance in computer vision, particularly for applications like night environment perception in autonomous driving systems. However, existing methods tend to parse night-time images from a day-time perspective, leaving the inherent challenges in low-light conditions (such as compromised texture and deceiving matching errors) unexplored. To address these issues, we propose a novel end-to-end optimized approach, named NightFormer, tailored for night-time semantic segmentation, avoiding the conventional practice of forcibly fitting night-time images into day-time distributions. Specifically, we design a pixel-level texture enhancement module to acquire texture-aware features hierarchically with phase enhancement and amplified attention, and an object-level reliable matching module to realize accurate association matching via reliable attention in low-light environments. Extensive experimental results on various challenging benchmarks including NightCity, BDD and Cityscapes demonstrate that our proposed method performs favorably against state-of-the-art night-time semantic segmentation methods.
Paper Structure (20 sections, 13 equations, 6 figures, 6 tables)

This paper contains 20 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of our motivation. (a) Due to poor lighting conditions and blurred details at night, we utilize Fourier phase decomposition to amplify texture information in night images. (b) Normal similarity paradigm tends to directly calculate the similarity between pixels and prototypes, which may lead to mismatching errors due to deceiving surroundings. (c) We propose the reliable attention with adaptively selected reliable points as bridge to calculate similarity rather than direct semantic-pixel matching, achieving more accurate correlation.
  • Figure 2: Framework of our proposed NightFormer. It includes a pixel-level texture enhancement module (Sec. \ref{['method:enhancement']}) to hierarchically aggregate phase texture into target information with amplified attention and an object-level reliable matching module (Sec. \ref{['method:reliable']}) to realize accurate matching between prototypes and pixels with reliable attention.
  • Figure 3: Comparison of qualitative results of our NightFormer and other methods on the NightCity tan2021night dataset.
  • Figure 4: Ablation of $N$ and $K$ on NightCity tan2021night and B-N deng2022nightlab.
  • Figure 5: Demonstration of per-class IOU in Tab. \ref{['tab:ablation-main']} on the NightCity tan2021night dataset.
  • ...and 1 more figures