MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool
TL;DR
MUSES introduces the first large-scale, multi-sensor semantic perception dataset for driving under uncertainty, pairing a frame camera with MEMS lidar, FMCW radar, an HD event camera, and IMU/GNSS. It provides synchronized multimodal data and a two-stage annotation protocol that yields high-quality 2D panoptic labels while capturing class- and instance-level uncertainty, enabling the novel uncertainty-aware panoptic segmentation task and the UPQ metric. The dataset includes 2500 labeled camera frames across diverse adverse conditions and 3 benchmarks (semantic, panoptic, and uncertainty-aware panoptic segmentation) with RGB-only and multimodal tracks, along with extensive analyses showing the value of non-camera modalities and the potential for improved sensor fusion. MUSES demonstrates strong cross-domain generalization and offers new research directions in multimodal fusion, uncertainty quantification, and robust perception under challenging weather and illumination. Overall, MUSES provides a robust, richly annotated testbed to advance reliable dense semantic perception for autonomous driving in uncertain conditions.
Abstract
Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.
