Table of Contents
Fetching ...

Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models

Patrick Müller, Alexander Braun, Margret Keuper

TL;DR

This work investigates how realistic optical aberrations challenge image classification and object detection, introducing OpticsBench and LensCorruptions as physically grounded benchmarks. By deriving blur kernels from Zernike polynomials and real lens prescriptions, the authors quantify performance degradation across diverse models on ImageNet and MSCOCO, revealing that single-kernel baselines inadequately proxy optical blur. They propose OpticsAugment, a GPU-accelerated data-augmentation technique that uses optical kernels during training, achieving substantial robustness gains (e.g., up to +29.6 percentage points on ImageNet-100 OpticsBench) and transferring improvements to 2D common corruptions. The LensCorruptions framework further demonstrates that robustness scales with lens quality and that real-world lens variability can be a stronger stress test than synthetic baselines, underscoring the need to account for blur type in robustness evaluations and deployment-ready training pipelines.

Abstract

Deep neural networks (DNNs) have proven to be successful in various computer vision applications such that models even infer in safety-critical situations. Therefore, vision models have to behave in a robust way to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model defocus, while ignoring the different blur kernel shapes that result from optical systems. To study model robustness against realistic optical blur effects, this paper proposes two datasets of blur corruptions, which we denote OpticsBench and LensCorruptions. OpticsBench examines primary aberrations such as coma, defocus, and astigmatism, i.e. aberrations that can be represented by varying a single parameter of Zernike polynomials. To go beyond the principled but synthetic setting of primary aberrations, LensCorruptions samples linear combinations in the vector space spanned by Zernike polynomials, corresponding to 100 real lenses. Evaluations for image classification and object detection on ImageNet and MSCOCO show that for a variety of different pre-trained models, the performance on OpticsBench and LensCorruptions varies significantly, indicating the need to consider realistic image corruptions to evaluate a model's robustness against blur.

Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models

TL;DR

This work investigates how realistic optical aberrations challenge image classification and object detection, introducing OpticsBench and LensCorruptions as physically grounded benchmarks. By deriving blur kernels from Zernike polynomials and real lens prescriptions, the authors quantify performance degradation across diverse models on ImageNet and MSCOCO, revealing that single-kernel baselines inadequately proxy optical blur. They propose OpticsAugment, a GPU-accelerated data-augmentation technique that uses optical kernels during training, achieving substantial robustness gains (e.g., up to +29.6 percentage points on ImageNet-100 OpticsBench) and transferring improvements to 2D common corruptions. The LensCorruptions framework further demonstrates that robustness scales with lens quality and that real-world lens variability can be a stronger stress test than synthetic baselines, underscoring the need to account for blur type in robustness evaluations and deployment-ready training pipelines.

Abstract

Deep neural networks (DNNs) have proven to be successful in various computer vision applications such that models even infer in safety-critical situations. Therefore, vision models have to behave in a robust way to disturbances such as noise or blur. While seminal benchmarks exist to evaluate model robustness to diverse corruptions, blur is often approximated in an overly simplistic way to model defocus, while ignoring the different blur kernel shapes that result from optical systems. To study model robustness against realistic optical blur effects, this paper proposes two datasets of blur corruptions, which we denote OpticsBench and LensCorruptions. OpticsBench examines primary aberrations such as coma, defocus, and astigmatism, i.e. aberrations that can be represented by varying a single parameter of Zernike polynomials. To go beyond the principled but synthetic setting of primary aberrations, LensCorruptions samples linear combinations in the vector space spanned by Zernike polynomials, corresponding to 100 real lenses. Evaluations for image classification and object detection on ImageNet and MSCOCO show that for a variety of different pre-trained models, the performance on OpticsBench and LensCorruptions varies significantly, indicating the need to consider realistic image corruptions to evaluate a model's robustness against blur.

Paper Structure

This paper contains 60 sections, 5 equations, 34 figures, 28 tables.

Figures (34)

  • Figure 1: Blur image corruptions applied to an ImageNet image. The effects of baseline blur hendrycks_benchmarking_2019 and OpticsBench blur kernels mueller_opticsbench2023 are visualized for severity 4. Although the image looks similarly blurred for the different kernels, the details are different: for example, the petals of the flower remain white for the baseline blur, while they have reddish and bluish color fringes for the multi-channel (R, G, B) primary aberrations considered in OpticsBench. The baseline is uniformly blurred, while astigmatism and coma introduce directional blur. In this article, we investigate how realistic blur kernel properties impact current image classification and object detection models.
  • Figure 2: The Modulation Transfer Function (MTF) and its derived metrics, MTF50 and MTF20 frequencies (markers), objectively measure lens quality in terms of sharpness. The higher the contrast, the better, so the higher the frequency assigned to either MTF50 or MTF20, the better. The effect of each MTF is illustrated by convolving the bar targets at the top right with the corresponding PSFs. The six spatial frequencies (cyc/px) double from left to right, while the far right represents the maximum frequency - the Nyquist frequency (0.5 cyc/px). The black MTF represents a high quality lens with high contrast transfer and an MTF50 frequency around 0.25cyc/px (half Nyquist), visible in the bar target's penultimate frequency bin. The orange MTF drops off quickly with an MTF50 frequency around 0.0625cyc/px visible in bar target's third frequency bin. Below MTF20 or even MTF10, the bar patterns are hardly discernible and turn all grayish.
  • Figure 3: (Simplified view) Lens elements are lumped into a black box (left) and its output is represented by the exit pupil goodman_introduction_2017. The PSF (enlarged for display purposes) is obtained by applying a nonlinear operation to the wavefront aberration $W_{\lambda}$ at the exit pupil. We simulate $W_{\lambda}$ using either ray-tracing or Zernike polynomials for different lenses and field positions. The PSFs can be used to generate our image blur corruptions.
  • Figure 4: Lens selection: (a) 3D Zernike coefficient-space for the medium field position illustrating the diversity of the lens aberrations. The orange dots represent the 100 selected lenses, the blue ones the excluded ones. The lenses are chosen all over the wide-spread coefficient-space. The dot size is controlled by the logarithmic vector norm. Details are given in the supplementary material \ref{['app:dataset_curation_details']}. (b) Sampling of a subset of 100 lenses (orange dots) from a wide range of qualities. The lens quality (y-axis) is derived from the downsampled MTFs (higher is better) and represents the normalized MTF50 obtained from MTFs averaged over field positions after downsampling. The x-axis represents the different lenses sorted by lens quality.
  • Figure 5: Ranking for the best 50 models on ImageNet-1k with respect to the baseline defocus blur corruption from hendrycks_benchmarking_2019. The different peaks are mainly caused by the robust ResNet50 versions. The Kendall tau kendall_treatment_1945$\tau$ is computed between the rankings of the baseline and a specific corruption. All $\tau$ values report a weak correlation to the baseline sorting.
  • ...and 29 more figures