Table of Contents
Fetching ...

SS-SFR: Synthetic Scenes Spatial Frequency Response on Virtual KITTI and Degraded Automotive Simulations for Object Detection

Daniel Jakab, Alexander Braun, Cathaoir Agnew, Reenu Mohandas, Brian Michael Deegan, Dara Molloy, Enda Ward, Tony Scanlan, Ciarán Eising

TL;DR

The paper addresses the lack of image-quality evaluation in automotive simulation and the impact of optical degradations on perception for autonomous driving. It introduces Synthetic Scenes Spatial Frequency Response (SS-SFR) by applying Gaussian blur to Virtual KITTI and measuring $MTF50$ via the Slanted Edge Method (ISO12233) with NS-SFR to isolate edge regions. Three detectors (Faster RCNN, YOLOF, DETR) are trained and evaluated on four variations, showing sharpness degrades from $MTF50$ ~ $0.245$ to ~ $0.119$ cy/px, while overall detection accuracy remains robust with small declines (≈0.58%, 1.45%, and 1.93%). This demonstrates that synthetic data with optical degradations can still support reliable object detection and points to future work incorporating more realistic degradations and other perception tasks to close the sim-to-real gap.

Abstract

Automotive simulation can potentially compensate for a lack of training data in computer vision applications. However, there has been little to no image quality evaluation of automotive simulation and the impact of optical degradations on simulation is little explored. In this work, we investigate Virtual KITTI and the impact of applying variations of Gaussian blur on image sharpness. Furthermore, we consider object detection, a common computer vision application on three different state-of-the-art models, thus allowing us to characterize the relationship between object detection and sharpness. It was found that while image sharpness (MTF50) degrades from an average of 0.245cy/px to approximately 0.119cy/px; object detection performance stays largely robust within 0.58\%(Faster RCNN), 1.45\%(YOLOF) and 1.93\%(DETR) across all respective held-out test sets.

SS-SFR: Synthetic Scenes Spatial Frequency Response on Virtual KITTI and Degraded Automotive Simulations for Object Detection

TL;DR

The paper addresses the lack of image-quality evaluation in automotive simulation and the impact of optical degradations on perception for autonomous driving. It introduces Synthetic Scenes Spatial Frequency Response (SS-SFR) by applying Gaussian blur to Virtual KITTI and measuring via the Slanted Edge Method (ISO12233) with NS-SFR to isolate edge regions. Three detectors (Faster RCNN, YOLOF, DETR) are trained and evaluated on four variations, showing sharpness degrades from ~ to ~ cy/px, while overall detection accuracy remains robust with small declines (≈0.58%, 1.45%, and 1.93%). This demonstrates that synthetic data with optical degradations can still support reliable object detection and points to future work incorporating more realistic degradations and other perception tasks to close the sim-to-real gap.

Abstract

Automotive simulation can potentially compensate for a lack of training data in computer vision applications. However, there has been little to no image quality evaluation of automotive simulation and the impact of optical degradations on simulation is little explored. In this work, we investigate Virtual KITTI and the impact of applying variations of Gaussian blur on image sharpness. Furthermore, we consider object detection, a common computer vision application on three different state-of-the-art models, thus allowing us to characterize the relationship between object detection and sharpness. It was found that while image sharpness (MTF50) degrades from an average of 0.245cy/px to approximately 0.119cy/px; object detection performance stays largely robust within 0.58\%(Faster RCNN), 1.45\%(YOLOF) and 1.93\%(DETR) across all respective held-out test sets.
Paper Structure (11 sections, 2 figures, 2 tables)

This paper contains 11 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Virtual KITTI Gaussian blur image degradation between $\sigma=[1,2,3]$.
  • Figure 2: Mean MTF50 versus mean Average Precision with IoU of between 0.5 and 0.95 (mAP0.5:0.95) for the Virtual KITTI baseline and respective degradations between $\sigma=[1,2,3]$. Mean MTF50 was calculated by averaging results between HMTF50 and VMTF50 from Table \ref{['tab:results']} creating the following: $MTF50(cy/px)=[0.119,0.120, 0.160, 0.245]$(see Table \ref{['tab:results']} for MTF50(mean)).