Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
Hoonhee Cho, Sung-Hoon Yoon, Hyeokjun Kweon, Kuk-Jin Yoon
TL;DR
EV-WSSS tackles the challenge of dense pixel-wise semantic segmentation for event cameras under sparse supervision by introducing 1-class-1-click labels and leveraging asymmetric dual-student learning on forward $E^f$ and backward $E^b$ event streams. It fuses this with feature-level prototype-based contrastive learning, employing intra-, inter-, and cross-branch aggregation via prototype distillation to sharpen semantic representations without dense GT. The approach is validated on DDD17-Seg, DSEC-Semantic, and the newly released DSEC Night-Point, showing strong gains over baselines and robustness to incomplete or noisy annotations, as well as competitive performance in UDA settings with weak target-domain labels. The work provides practical benefits for event-based segmentation in challenging conditions (e.g., nighttime) and contributes a new dataset and accessible code for community use.
Abstract
Event cameras excel in capturing high-contrast scenes and dynamic objects, offering a significant advantage over traditional frame-based cameras. Despite active research into leveraging event cameras for semantic segmentation, generating pixel-wise dense semantic maps for such challenging scenarios remains labor-intensive. As a remedy, we present EV-WSSS: a novel weakly supervised approach for event-based semantic segmentation that utilizes sparse point annotations. To fully leverage the temporal characteristics of event data, the proposed framework performs asymmetric dual-student learning between 1) the original forward event data and 2) the longer reversed event data, which contain complementary information from the past and the future, respectively. Besides, to mitigate the challenges posed by sparse supervision, we propose feature-level contrastive learning based on class-wise prototypes, carefully aggregated at both spatial region and sample levels. Additionally, we further excavate the potential of our dual-student learning model by exchanging prototypes between the two learning paths, thereby harnessing their complementary strengths. With extensive experiments on various datasets, including DSEC Night-Point with sparse point annotations newly provided by this paper, the proposed method achieves substantial segmentation results even without relying on pixel-level dense ground truths. The code and dataset are available at https://github.com/Chohoonhee/EV-WSSS.
