Table of Contents
Fetching ...

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

TL;DR

This work tackles unsupervised event-based semantic segmentation under cross-domain transfer from labeled images to unlabeled event streams. It introduces HPL-ESS, which fuses a self-training UDA stream with offline event-to-image reconstruction to generate hybrid pseudo labels, and enhances robustness with a noisy-label learning strategy and a soft prototypical alignment module. The approach yields substantial gains on DSEC-Semantic and DDD17, surpassing several supervised methods in some settings and reducing reliance on noise-free reconstructions. The combination of hybrid labeling, progressive label refinement, and SPA significantly improves target-domain feature consistency and segmentation accuracy in high-motion and challenging lighting conditions.

Abstract

Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and learning from noisy pseudo labels, especially when generated from a single source, may reinforce the errors. This drawback is also called confirmation bias in pseudo-labeling. In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels. In particular, we first employ a plain unsupervised domain adaptation framework as our baseline, which can generate a set of pseudo labels through self-training. Then, we incorporate offline event-to-image reconstruction into the framework, and obtain another set of pseudo labels by predicting segmentation maps on the reconstructed images. A noisy label learning strategy is designed to mix the two sets of pseudo labels and enhance the quality. Moreover, we propose a soft prototypical alignment module to further improve the consistency of target domain features. Extensive experiments show that our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset (+5.88% accuracy, +10.32% mIoU), which even surpasses several supervised methods.

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

TL;DR

This work tackles unsupervised event-based semantic segmentation under cross-domain transfer from labeled images to unlabeled event streams. It introduces HPL-ESS, which fuses a self-training UDA stream with offline event-to-image reconstruction to generate hybrid pseudo labels, and enhances robustness with a noisy-label learning strategy and a soft prototypical alignment module. The approach yields substantial gains on DSEC-Semantic and DDD17, surpassing several supervised methods in some settings and reducing reliance on noise-free reconstructions. The combination of hybrid labeling, progressive label refinement, and SPA significantly improves target-domain feature consistency and segmentation accuracy in high-motion and challenging lighting conditions.

Abstract

Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and learning from noisy pseudo labels, especially when generated from a single source, may reinforce the errors. This drawback is also called confirmation bias in pseudo-labeling. In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels. In particular, we first employ a plain unsupervised domain adaptation framework as our baseline, which can generate a set of pseudo labels through self-training. Then, we incorporate offline event-to-image reconstruction into the framework, and obtain another set of pseudo labels by predicting segmentation maps on the reconstructed images. A noisy label learning strategy is designed to mix the two sets of pseudo labels and enhance the quality. Moreover, we propose a soft prototypical alignment module to further improve the consistency of target domain features. Extensive experiments show that our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset (+5.88% accuracy, +10.32% mIoU), which even surpasses several supervised methods.
Paper Structure (17 sections, 9 equations, 5 figures, 5 tables)

This paper contains 17 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison on the DSEC-Semantic dataset. Our method outperforms other UDA works by a large margin and even surpasses fully supervised methods.
  • Figure 2: Overview of the HPL-ESS architecture. During training, we introduce offline event-to-image reconstruction as input to our framework. To avoid overfitting noise, we use only a small proportion (5%) of the reconstructions. The network is trained by hybrid pseudo labels from reconstruction and self-prediction. Additionally, a soft prototypical alignment (SPA) module is designed to enhance the consistency of target domain features. In the inference phase, only events are used as input.
  • Figure 3: The concept of our SPA module on source domain, reconstructed images, and events.
  • Figure 4: Example results on DDD17 dataset. The DDD17 ground truth lacks details for some objects.
  • Figure 5: Visualization results on DESC-Semantic dataset. From left to right: event frame, event-to-image reconstruction, the maps predicted by E2VID, ESS, and our proposed HPL-ESS, ground truth.