Benchmarking the Robustness of Panoptic Segmentation for Automated Driving
Yiting Wang, Haonan Zhao, Daniel Gummadi, Mehrdad Dianati, Kurt Debattista, Valentina Donzella
TL;DR
This work tackles the robustness of panoptic segmentation for automated driving under diverse camera degradations by introducing a unifying degradation-impact pipeline and a synthetic, balanced D-Cityscapes+ dataset with 19 degradation factors across 3 severity levels. It evaluates three state-of-the-art panoptic models (including CNN- and ViT-based backbones) using eight image-quality metrics and the panoptic quality metric PQ, detailing how degradation factors influence perception performance. Key findings show that Gaussian noise and droplets on the lens most degrade PQ, ViT-based backbones offer superior robustness, and metrics like CW-SSIM and LPIPS strongly predict panoptic performance, enabling predictive assessment and design guidance for AAD systems. The framework provides a practical, data-driven basis for robustness benchmarking and sensor-quality planning in automated driving.
Abstract
Precise situational awareness is required for the safe decision-making of assisted and automated driving (AAD) functions. Panoptic segmentation is a promising perception technique to identify and categorise objects, impending hazards, and driveable space at a pixel level. While segmentation quality is generally associated with the quality of the camera data, a comprehensive understanding and modelling of this relationship are paramount for AAD system designers. Motivated by such a need, this work proposes a unifying pipeline to assess the robustness of panoptic segmentation models for AAD, correlating it with traditional image quality. The first step of the proposed pipeline involves generating degraded camera data that reflects real-world noise factors. To this end, 19 noise factors have been identified and implemented with 3 severity levels. Of these factors, this work proposes novel models for unfavourable light and snow. After applying the degradation models, three state-of-the-art CNN- and vision transformers (ViT)-based panoptic segmentation networks are used to analyse their robustness. The variations of the segmentation performance are then correlated to 8 selected image quality metrics. This research reveals that: 1) certain specific noise factors produce the highest impact on panoptic segmentation, i.e. droplets on lens and Gaussian noise; 2) the ViT-based panoptic segmentation backbones show better robustness to the considered noise factors; 3) some image quality metrics (i.e. LPIPS and CW-SSIM) correlate strongly with panoptic segmentation performance and therefore they can be used as predictive metrics for network performance.
