Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

Chris Hamblin; Thomas Fel; Srijani Saha; Talia Konkle; George Alvarez

Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

Chris Hamblin, Thomas Fel, Srijani Saha, Talia Konkle, George Alvarez

TL;DR

This work tackles the challenge of explaining vision-model feature responses by integrating where and what explanations into a single method. Feature Accentuation seeds the image and optimizes a latent-space-aware perturbation to maximize a target feature’s activation while regularizing toward the seed in a model-wide latent space, yielding naturalistic, local accentuations. The authors demonstrate that these accentuations traverse neural circuits similarly to natural images, produce interpretable misclassification explanations, and reveal diverse latent-feature manifestations, supported by a human extrapolation study. The approach, implemented in the open-source Faccent library, extends the XAI toolbox with local, seed-aware visual explanations and latent-space insights that can aid debugging and model understanding in practical settings.

Abstract

Efforts to decode neural network vision models necessitate a comprehensive grasp of both the spatial and semantic facets governing feature responses within images. Most research has primarily centered around attribution methods, which provide explanations in the form of heatmaps, showing where the model directs its attention for a given feature. However, grasping 'where' alone falls short, as numerous studies have highlighted the limitations of those methods and the necessity to understand 'what' the model has recognized at the focal point of its attention. In parallel, 'Feature visualization' offers another avenue for interpreting neural network features. This approach synthesizes an optimal image through gradient ascent, providing clearer insights into 'what' features respond to. However, feature visualizations only provide one global explanation per feature; they do not explain why features activate for particular images. In this work, we introduce a new method to the interpretability tool-kit, 'feature accentuation', which is capable of conveying both where and what in arbitrary input images induces a feature's response. At its core, feature accentuation is image-seeded (rather than noise-seeded) feature visualization. We find a particular combination of parameterization, augmentation, and regularization yields naturalistic visualizations that resemble the seed image and target feature simultaneously. Furthermore, we validate these accentuations are processed along a natural circuit by the model. We make our precise implementation of feature accentuation available to the community as the Faccent library, an extension of Lucent.

Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

TL;DR

Abstract

Paper Structure (34 sections, 3 equations, 24 figures, 1 table)

This paper contains 34 sections, 3 equations, 24 figures, 1 table.

Introduction
Methods
Hardware & Software.
Notation.
Feature Visualization.
Feature accentuation
The appropriate augmentation
Adding spatial attribution
Experiments
Circuit Coherence Assessment
Effect of $\lambda$.
Misclassifications
Explaining Latent Features
Human Experiment
Limitations
...and 19 more sections

Figures (24)

Figure 1: We introduce Feature accentuation , an image-seeded inputs variant of feature visualization, enabling local explanations. (Top) Our method can generate perturbations that accentuate a specific class or other logits. (Bottom) More generally, we propose to accentuate neurons or direction to understand model representations, (e.g. here, a knot neuron).
Figure 2: An iguana excites several class logits in InceptionV1, but what about the image excites each logit? Attribution maps highlight important regions of the image, but not what each logit sees in the region. Feature visualizations yield an exemplar for each logit, but these are hard to relate to the iguana image. Feature accentuation (ours) constitutes a powerful intermediary, transforming the iguana into a local exemplar for each class.
Figure 3: (Top) Influence of Regularization Strength ($\lambda$). With no regularization accentuations significantly deviate from the original image, while excessive regularization prevents meaningful alterations to the image. (Bottom) Influence of Regularization Layer ($\bm{f}_\ell$) -- from pixel space ($\mathcal{X}$) to a deep layer of InceptionV1 (mixed5). Regularization in pixel space does not enable meaningful image modifications, whereas regularization in excessively deep layers produces hallucinations. Intermediate regularization accentuates existing image details that drive the logit without injecting new features across the entire image.
Figure 4: The effect of random crop augmentations. Square brackets indicate the [minimum, maximum] permissible dimensions (in %) for the bounding box crop. Smaller crops add definition to the image, but when only small crops are applied tiny hallucinations appear. We find applying the full range of crops each batch to produce balanced results.
Figure 5: Accentuating unrelated features in images can lead to significant changes. Accentuations of the most inhibited logits for the iguana image. While the model doesn't associate any 'schoolbus-like' features with the iguana, an observer might mistakenly think otherwise due to the result of feature accentuation. To address this, we suggest incorporating spatial attribution in Section \ref{['sec:masking']} as the final ingredient for Feature accentuation.
...and 19 more figures

Theorems & Definitions (1)

Definition 2.1: Feature accentuation

Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

TL;DR

Abstract

Feature Accentuation: Revealing 'What' Features Respond to in Natural Images

Authors

TL;DR

Abstract

Table of Contents

Figures (24)

Theorems & Definitions (1)