Table of Contents
Fetching ...

MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool

TL;DR

MUSES introduces the first large-scale, multi-sensor semantic perception dataset for driving under uncertainty, pairing a frame camera with MEMS lidar, FMCW radar, an HD event camera, and IMU/GNSS. It provides synchronized multimodal data and a two-stage annotation protocol that yields high-quality 2D panoptic labels while capturing class- and instance-level uncertainty, enabling the novel uncertainty-aware panoptic segmentation task and the UPQ metric. The dataset includes 2500 labeled camera frames across diverse adverse conditions and 3 benchmarks (semantic, panoptic, and uncertainty-aware panoptic segmentation) with RGB-only and multimodal tracks, along with extensive analyses showing the value of non-camera modalities and the potential for improved sensor fusion. MUSES demonstrates strong cross-domain generalization and offers new research directions in multimodal fusion, uncertainty quantification, and robust perception under challenging weather and illumination. Overall, MUSES provides a robust, richly annotated testbed to advance reliable dense semantic perception for autonomous driving in uncertain conditions.

Abstract

Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.

MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

TL;DR

MUSES introduces the first large-scale, multi-sensor semantic perception dataset for driving under uncertainty, pairing a frame camera with MEMS lidar, FMCW radar, an HD event camera, and IMU/GNSS. It provides synchronized multimodal data and a two-stage annotation protocol that yields high-quality 2D panoptic labels while capturing class- and instance-level uncertainty, enabling the novel uncertainty-aware panoptic segmentation task and the UPQ metric. The dataset includes 2500 labeled camera frames across diverse adverse conditions and 3 benchmarks (semantic, panoptic, and uncertainty-aware panoptic segmentation) with RGB-only and multimodal tracks, along with extensive analyses showing the value of non-camera modalities and the potential for improved sensor fusion. MUSES demonstrates strong cross-domain generalization and offers new research directions in multimodal fusion, uncertainty quantification, and robust perception under challenging weather and illumination. Overall, MUSES provides a robust, richly annotated testbed to advance reliable dense semantic perception for autonomous driving in uncertain conditions.

Abstract

Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.
Paper Structure (33 sections, 4 equations, 15 figures, 22 tables)

This paper contains 33 sections, 4 equations, 15 figures, 22 tables.

Figures (15)

  • Figure 1: Annotated scene of MUSES. First row left to right: frame camera, lidar, radar; second row left to right: event camera, panoptic annotation, and difficulty annotation.
  • Figure 2: 3D vs. 2D. This example shows that the lidar point cloud (left, projected onto RGB image) can deteriorate in adverse weather, yielding insufficient information for 3D annotation. By contrast, the distant cars are captured with 2D annotations (right).
  • Figure 3: Example of stage 1 and stage 2 panoptic annotations H1 and H2. The auxiliary data available in stage 2 allows better separation between the three car instances on the left, reducing the unknown_instance area from H1 to H2, but keeping a difficult_instance label (grey) in the difficulty map. Notice the additional class labels added to H2 for distant cars; the corresponding regions keep the difficult_class (white) label in the difficulty map.
  • Figure 4: Flowchart of uncertainty-aware panoptic annotation.
  • Figure 5: Visualization of two adverse-condition samples from MUSES. From left to right: RGB image; motion-compensated lidar points projected and overlaid with the image; events projected onto the image (assuming infinite distance); azimuth-range radar scan (with ranges above a threshold cropped out); corresponding normal-condition image; panoptic ground truth; difficulty map. Best viewed zoomed in.
  • ...and 10 more figures