Table of Contents
Fetching ...

On the Detection of Anomalous or Out-Of-Distribution Data in Vision Models Using Statistical Techniques

Laura O'Mahony, David JP O'Sullivan, Nikola S. Nikolov

TL;DR

A tool, Benford's law, is assessed as a method used to quantify the difference between real and corrupted inputs and it is believed that in many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data.

Abstract

Out-of-distribution data and anomalous inputs are vulnerabilities of machine learning systems today, often causing systems to make incorrect predictions. The diverse range of data on which these models are used makes detecting atypical inputs a difficult and important task. We assess a tool, Benford's law, as a method used to quantify the difference between real and corrupted inputs. We believe that in many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data. We hope to open a discussion on these applications and further areas where this technique is underexplored.

On the Detection of Anomalous or Out-Of-Distribution Data in Vision Models Using Statistical Techniques

TL;DR

A tool, Benford's law, is assessed as a method used to quantify the difference between real and corrupted inputs and it is believed that in many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data.

Abstract

Out-of-distribution data and anomalous inputs are vulnerabilities of machine learning systems today, often causing systems to make incorrect predictions. The diverse range of data on which these models are used makes detecting atypical inputs a difficult and important task. We assess a tool, Benford's law, as a method used to quantify the difference between real and corrupted inputs. We believe that in many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data. We hope to open a discussion on these applications and further areas where this technique is underexplored.
Paper Structure (8 sections, 1 equation, 5 figures)

This paper contains 8 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: The theoretical distribution of first digits in data that obeys Benford's law. The height of the bar is the percentage of numbers that start with the digit.
  • Figure 2: The procedure of quantifying the difference between the distribution of the image discrete cosine transform (DCT) coefficients and the theoretical distribution given by Benford's law consists of (1) applying a discrete cosine transform to the image $I$, followed by (2) calculating the leading digit (LD) distribution of the transform coefficients, and finally (3) a statistical comparison between the theoretical distribution $p$ and observed distribution $\hat{p}$ of the transformed coefficients is made. The raw image $I$ is taken from the ImageNet-C dataset hendrycks2018benchmarking.
  • Figure 3: An example of images taken from ImageNet-C hendrycks2018benchmarking representing different corruption types.
  • Figure 4: The Jensen-Shannon divergence of the statistics of the discrete cosine transform coefficients for clean ImageNet images (blue), images of corruption severity 1 (orange), corruption severity 3 (green), and corruption severity 5 (red).
  • Figure 5: The top-1 predictive accuracy evaluated using AlexNet krizhevsky2017imagenet is shown for clean ImageNet images, images of corruption severity 1, corruption severity 3, and corruption severity 5.