Table of Contents
Fetching ...

Dimensionality-induced information loss of outliers in deep neural networks

Kazuki Uematsu, Kosuke Haruki, Taiji Suzuki, Mitsuhiro Kimura, Takahiro Takimoto, Hideyuki Nakagawa

TL;DR

It is found that intrinsic low dimensionalization of DNNs is essential for understanding how OOD samples become more distinct from ID samples as features propagate to deeper layers as well as demonstrating the utility of dimensionality.

Abstract

Out-of-distribution (OOD) detection is a critical issue for the stable and reliable operation of systems using a deep neural network (DNN). Although many OOD detection methods have been proposed, it remains unclear how the differences between in-distribution (ID) and OOD samples are generated by each processing step inside DNNs. We experimentally clarify this issue by investigating the layer dependence of feature representations from multiple perspectives. We find that intrinsic low dimensionalization of DNNs is essential for understanding how OOD samples become more distinct from ID samples as features propagate to deeper layers. Based on these observations, we provide a simple picture that consistently explains various properties of OOD samples. Specifically, low-dimensional weights eliminate most information from OOD samples, resulting in misclassifications due to excessive attention to dataset bias. In addition, we demonstrate the utility of dimensionality by proposing a dimensionality-aware OOD detection method based on alignment of features and weights, which consistently achieves high performance for various datasets with lower computational cost.

Dimensionality-induced information loss of outliers in deep neural networks

TL;DR

It is found that intrinsic low dimensionalization of DNNs is essential for understanding how OOD samples become more distinct from ID samples as features propagate to deeper layers as well as demonstrating the utility of dimensionality.

Abstract

Out-of-distribution (OOD) detection is a critical issue for the stable and reliable operation of systems using a deep neural network (DNN). Although many OOD detection methods have been proposed, it remains unclear how the differences between in-distribution (ID) and OOD samples are generated by each processing step inside DNNs. We experimentally clarify this issue by investigating the layer dependence of feature representations from multiple perspectives. We find that intrinsic low dimensionalization of DNNs is essential for understanding how OOD samples become more distinct from ID samples as features propagate to deeper layers. Based on these observations, we provide a simple picture that consistently explains various properties of OOD samples. Specifically, low-dimensional weights eliminate most information from OOD samples, resulting in misclassifications due to excessive attention to dataset bias. In addition, we demonstrate the utility of dimensionality by proposing a dimensionality-aware OOD detection method based on alignment of features and weights, which consistently achieves high performance for various datasets with lower computational cost.

Paper Structure

This paper contains 29 sections, 5 equations, 30 figures, 3 tables.

Figures (30)

  • Figure 1: Qualitative picture showing how OOD samples deviate from ID samples, and how OOD samples are classified. Low dimensionalization of weights arises from the significant difference in feature propagations between ID and OOD samples due to their alignment. The resulting features of OOD samples are dominated by dataset bias, the common characteristics in the dataset, leading to the biased prediction.
  • Figure 2: Layer dependence of the stable rank of the covariance matrix $\overline{\Sigma}$ and the weight matrix $W$ for the VGG-13 model. The dashed line indicates the transition layer. Low dimensionalization of features and weights occurs at almost the same layer. See Appendices \ref{['sec:app_full-model']} and \ref{['sec:app_svhn-mnist']} for further verification.
  • Figure 3: Layer dependence of the AUROC obtained through (a) Mahalanobis distance $M$ and (b) projected norm $||x_p||$ for the VGG-13 model. Different line colors represent the different OOD datasets evaluated. The dashed line indicates the transition layer. AUROCs are stabilized after transition independent of datasets for feature-based detection, while the projection-based discrimination between ID and OOD samples becomes clear just at the transition layer. See Appendices \ref{['sec:app_full-model']} and \ref{['sec:app_svhn-mnist']} for further verification.
  • Figure 4: CKA of features in various layers for the VGG-13 model. (a) CKA of ID (CIFAR-10) samples. (b) CKA of OOD (CIFAR-100) samples. In both pannels, the horizontal and vertical axes represent layers, and the color bar represents CKA. Block-like saturations appear both for ID and OOD samples around the transition layer. See Appendices \ref{['sec:app_full-model']} and \ref{['sec:app_svhn-mnist']} for further verification.
  • Figure 5: Noise sensitivity for the VGG-13 model. Left (a) and right (b) figures show the noise sensitivities of ID (CIFAR-10) samples and OOD (CIFAR-100) samples, respectively. In each figure, the horizontal axis represents the layer and the vertical axis represents corresponding noise sensitivity. Different colors indicate the input layers where noise is injected. The dashed vertical line indicates the transition layer. The dashed horizontal line is plotted to clarify the difference between ID and OOD samples. OOD samples are more sensitive to noise injection compared with ID samples. See Appendices \ref{['sec:app_full-model']} and \ref{['sec:app_svhn-mnist']} for further verification.
  • ...and 25 more figures