Table of Contents
Fetching ...

Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation

Nan Bao, Yifan Zhao, Lin Zhu, Jia Li

TL;DR

This paper tackles semantic segmentation when RGB data are corrupted by extreme conditions by fusing event data with RGB through a shared edge-based latent space. It introduces Edge-awareness Semantic Concordance (ESC), composed of Edge-awareness Latent Re-coding (ELR), Re-coded Consolidation (RC), and Uncertainty Optimization (UO), guided by a pre-trained edge dictionary learned via VQ-VAE. The approach realigns heterogeneous modalities into a unified edge-informed semantic space, consolidates edge cues, and optimizes fusion under per-pixel uncertainties, achieving state-of-the-art results on synthetic and real extreme-condition datasets and demonstrating strong resilience to occlusion. The work provides public code and novel datasets (DERS-XS, DERS-XR, DSEC-Xtrm) to benchmark event-RGB segmentation under challenging scenarios, with potential impact on robust perception for autonomous systems.

Abstract

Semantic segmentation has achieved great success in ideal conditions. However, when facing extreme conditions (e.g., insufficient light, fierce camera motion), most existing methods suffer from significant information loss of RGB, severely damaging segmentation results. Several researches exploit the high-speed and high-dynamic event modality as a complement, but event and RGB are naturally heterogeneous, which leads to feature-level mismatch and inferior optimization of existing multi-modality methods. Different from these researches, we delve into the edge secret of both modalities for resilient fusion and propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues. In this framework, we first propose Edge-awareness Latent Re-coding, which obtains uncertainty indicators while realigning event-RGB features into unified semantic space guided by re-coded distribution, and transfers event-RGB distributions into re-coded features by utilizing a pre-established edge dictionary as clues. We then propose Re-coded Consolidation and Uncertainty Optimization, which utilize re-coded edge features and uncertainty indicators to solve the heterogeneous event-RGB fusion issues under extreme conditions. We establish two synthetic and one real-world event-RGB semantic segmentation datasets for extreme scenario comparisons. Experimental results show that our method outperforms the state-of-the-art by a 2.55% mIoU on our proposed DERS-XS, and possesses superior resilience under spatial occlusion. Our code and datasets are publicly available at https://github.com/iCVTEAM/ESC.

Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation

TL;DR

This paper tackles semantic segmentation when RGB data are corrupted by extreme conditions by fusing event data with RGB through a shared edge-based latent space. It introduces Edge-awareness Semantic Concordance (ESC), composed of Edge-awareness Latent Re-coding (ELR), Re-coded Consolidation (RC), and Uncertainty Optimization (UO), guided by a pre-trained edge dictionary learned via VQ-VAE. The approach realigns heterogeneous modalities into a unified edge-informed semantic space, consolidates edge cues, and optimizes fusion under per-pixel uncertainties, achieving state-of-the-art results on synthetic and real extreme-condition datasets and demonstrating strong resilience to occlusion. The work provides public code and novel datasets (DERS-XS, DERS-XR, DSEC-Xtrm) to benchmark event-RGB segmentation under challenging scenarios, with potential impact on robust perception for autonomous systems.

Abstract

Semantic segmentation has achieved great success in ideal conditions. However, when facing extreme conditions (e.g., insufficient light, fierce camera motion), most existing methods suffer from significant information loss of RGB, severely damaging segmentation results. Several researches exploit the high-speed and high-dynamic event modality as a complement, but event and RGB are naturally heterogeneous, which leads to feature-level mismatch and inferior optimization of existing multi-modality methods. Different from these researches, we delve into the edge secret of both modalities for resilient fusion and propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues. In this framework, we first propose Edge-awareness Latent Re-coding, which obtains uncertainty indicators while realigning event-RGB features into unified semantic space guided by re-coded distribution, and transfers event-RGB distributions into re-coded features by utilizing a pre-established edge dictionary as clues. We then propose Re-coded Consolidation and Uncertainty Optimization, which utilize re-coded edge features and uncertainty indicators to solve the heterogeneous event-RGB fusion issues under extreme conditions. We establish two synthetic and one real-world event-RGB semantic segmentation datasets for extreme scenario comparisons. Experimental results show that our method outperforms the state-of-the-art by a 2.55% mIoU on our proposed DERS-XS, and possesses superior resilience under spatial occlusion. Our code and datasets are publicly available at https://github.com/iCVTEAM/ESC.

Paper Structure

This paper contains 31 sections, 7 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Edge-awareness Semantic Concordance for event-RGB fusion. RGB suffers from severe information loss under extreme conditions, while events are sensitive to edges in motion, complementing the lost information. Heterogeneous properties of event and RGB lead to feature-level mismatch and inferior optimization of existing methods. Our ESC framework utilizes semantic edge as an intermediate commonality for a more resilient fusion.
  • Figure 2: Correlation between events and semantic edge. We randomly select 200 event sequences with dilated boundary map from true-labeled DERS-XS and real-world DSEC-Semantic, counting the ratio of edge pixels to all pixels and the ratio of events at edge pixels to all events, respectively. For both datasets, as the area of edge expands, the events ratio is always greater than the boundary ratio. This exhibits a strong correlation between events and semantic edge under different conditions.
  • Figure 3: Establishment of edge dictionary. We establish our edge dictionary based on a VQ-VAE architecture. Semantic edge is retrieved from the semantic mask ground truth and leveraged for learning its discrete latent representations as an edge dictionary, which serves as intermediate clues across heterogeneous event and RGB.
  • Figure 4: The overall architecture of our Edge-awareness Semantic Concordance (ESC). ESC contains a pre-established edge dictionary and three key modules, namely Edge-awareness Latent Re-coding (ELR), Re-coded Consolidation (RC), and Uncertainty Optimization (UO). Based on the pre-trained edge dictionary, ELR transfers edge embeddings into re-coded distribution and modality distribution into re-coded features. RC consolidates edge information with re-coded features. UO jointly optimizes modality edge features with uncertainties. Features from RC and UO are concatenated for final semantic mask prediction.
  • Figure 5: RC and UO. The two modules utilize an attention-based structure with learnable noise embeddings for a resilient fusion.
  • ...and 12 more figures