Table of Contents
Fetching ...

Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

Pavel Kharyuk, Sergey Matveev, Ivan Oseledets

TL;DR

This work introduces a framework to analyze how CNN internal activations respond when multiple input augmentations are applied simultaneously, leveraging Sobol indices and Shapley values to decompose activation variance by augmentation factors and their interactions. It combines full-scale sensitivity analysis with guided masking and single-channel segment studies across AlexNet, VGG11, and ResNet18 on ILSVRC and Places365, revealing depth-dependent specialization and robust augmentation sensitivity, including consistent large effects from grayscale, erasing, and hue distortions. The approach yields activation maps, correlation patterns, and linear discriminant analyses that validate the sensitivity findings and enable targeted masking to probe prediction biases, with potential extensions to biological neural network studies. Overall, the framework contributes a principled, quantitative method to understand and potentially enhance robustness of deep CNNs to complex data distortions, and it could inform strategies for fault-tolerant architectures and interpretable AI.

Abstract

Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation changes measured by variance, which can be decomposed into components related to each augmentation, employing Sobol indices and Shapley values. These quantities enable one to visualize sensitivity to different variables and use them for guided masking of activations. In addition, we introduce a way of single-class sensitivity analysis where the candidates are filtered according to their matching to prediction bias generated by targeted damaging of the activations. Relying on the observed parallels, we assume that the developed framework can potentially be transferred to studying biological neural networks in complex environments.

Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

TL;DR

This work introduces a framework to analyze how CNN internal activations respond when multiple input augmentations are applied simultaneously, leveraging Sobol indices and Shapley values to decompose activation variance by augmentation factors and their interactions. It combines full-scale sensitivity analysis with guided masking and single-channel segment studies across AlexNet, VGG11, and ResNet18 on ILSVRC and Places365, revealing depth-dependent specialization and robust augmentation sensitivity, including consistent large effects from grayscale, erasing, and hue distortions. The approach yields activation maps, correlation patterns, and linear discriminant analyses that validate the sensitivity findings and enable targeted masking to probe prediction biases, with potential extensions to biological neural network studies. Overall, the framework contributes a principled, quantitative method to understand and potentially enhance robustness of deep CNNs to complex data distortions, and it could inform strategies for fault-tolerant architectures and interpretable AI.

Abstract

Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation changes measured by variance, which can be decomposed into components related to each augmentation, employing Sobol indices and Shapley values. These quantities enable one to visualize sensitivity to different variables and use them for guided masking of activations. In addition, we introduce a way of single-class sensitivity analysis where the candidates are filtered according to their matching to prediction bias generated by targeted damaging of the activations. Relying on the observed parallels, we assume that the developed framework can potentially be transferred to studying biological neural networks in complex environments.

Paper Structure

This paper contains 30 sections, 12 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Summarized pipeline of the proposed framework. A block is referred to as a segment of a neural network participating in the analysis as a single mapping from its input to the corresponding activations. A checkpoint is a point of the network at which activations are gathered; as it is related to the preceding block, we use both terms interchangeably.
  • Figure 2: Sample image from the ILSVRC dataset, its corresponding HSV decomposition and examples of single and simultaneous augmentations considered in the study.
  • Figure 3: Examples of the sensitivity to initial augmentations of the ILSVRC input manifesting itself in deep layers (a--c) and corresponding log-transformed coefficients of variance (d--f). (a), (d): AlexNet, checkpoint at output from features.7 module, 301-st output channel (neuron), grayscale and saturation. The larger sensitivity to grayscaling matched with lower CoV value, and vice versa for saturation; (b), (e): VGG11, avgpool, 462, erasing. The prominent sensitivity to erasing is not expressed in higher CoV values; (c), (f): ResNet18, layer3, 193, rolling and elliptic local blur. For the latter transform, the sensitivity exhibits clear spatial structure, which is only slightly revealing itself in a CoV map.
  • Figure 4: Correlation patterns extracted by guided damaging of activations during inference. The darker colours correspond to the lower values of correlation. Each item within a single heatmap is a Spearman correlation between augmented and non-augmented masked inputs. Rows correspond to augmentations used for transforming the input, and columns give the SA variables as sources for building masks. Sensitivity values are designated as follows: shpv are Shapley values, and si and siT are the first-order and total Sobol indices, respectively.
  • Figure 5: Higher-order correlation patterns extracted for the segments separated from their networks. At first, Spearman correlations between paired SA variables were computed at a checkpoint level for both isolated fixed-channel segments (H/S/V) and corresponding original parts. Then, the cross-correlations between triangular parts of these matrices were computed, matching different checkpoint levels. (a) Results for paired checkpoints within a fixed input channel (isolated segments only). Lighter colours encode correlations closer to zero (in their absolute value). (b) Matching the isolated and original segments. Darker colours correspond to lower values. Single heatmap legend: rows correspond to single-channel checkpoints, columns to ones of original. Both (a) and (b): hatched cells indicate constant input that could not be correlated.