Table of Contents
Fetching ...

Linking in Style: Understanding learned features in deep learning models

Maren H. Wehrheim, Pamela Osuna-Vargas, Matthias Kaschube

TL;DR

An automatic method to visualize and systematically analyze learned features in CNNs by introducing a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model (StyleGAN-XL), thereby enabling an interpretable, human-friendly visualization of the classifier's representations.

Abstract

Convolutional neural networks (CNNs) learn abstract features to perform object classification, but understanding these features remains challenging due to difficult-to-interpret results or high computational costs. We propose an automatic method to visualize and systematically analyze learned features in CNNs. Specifically, we introduce a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model (StyleGAN-XL), thereby enabling an interpretable, human-friendly visualization of the classifier's representations. Our findings indicate a congruent semantic order in both spaces, enabling a direct linear mapping between them. Training the linking network is computationally inexpensive and decoupled from training both the GAN and the classifier. We introduce an automatic pipeline that utilizes such GAN-based visualizations to quantify learned representations by analyzing activation changes in the classifier in the image domain. This quantification allows us to systematically study the learned representations in several thousand units simultaneously and to extract and visualize units selective for specific semantic concepts. Further, we illustrate how our method can be used to quantify and interpret the classifier's decision boundary using counterfactual examples. Overall, our method offers systematic and objective perspectives on learned abstract representations in CNNs. https://github.com/kaschube-lab/LinkingInStyle.git

Linking in Style: Understanding learned features in deep learning models

TL;DR

An automatic method to visualize and systematically analyze learned features in CNNs by introducing a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model (StyleGAN-XL), thereby enabling an interpretable, human-friendly visualization of the classifier's representations.

Abstract

Convolutional neural networks (CNNs) learn abstract features to perform object classification, but understanding these features remains challenging due to difficult-to-interpret results or high computational costs. We propose an automatic method to visualize and systematically analyze learned features in CNNs. Specifically, we introduce a linking network that maps the penultimate layer of a pre-trained classifier to the latent space of a generative model (StyleGAN-XL), thereby enabling an interpretable, human-friendly visualization of the classifier's representations. Our findings indicate a congruent semantic order in both spaces, enabling a direct linear mapping between them. Training the linking network is computationally inexpensive and decoupled from training both the GAN and the classifier. We introduce an automatic pipeline that utilizes such GAN-based visualizations to quantify learned representations by analyzing activation changes in the classifier in the image domain. This quantification allows us to systematically study the learned representations in several thousand units simultaneously and to extract and visualize units selective for specific semantic concepts. Further, we illustrate how our method can be used to quantify and interpret the classifier's decision boundary using counterfactual examples. Overall, our method offers systematic and objective perspectives on learned abstract representations in CNNs. https://github.com/kaschube-lab/LinkingInStyle.git
Paper Structure (16 sections, 4 equations, 7 figures)

This paper contains 16 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: Visualization and systematic quantification of a classifier's learned representations. Left: We introduce a linking network (red arrow) that links an activation pattern $r \in R$ in the penultimate layer of a classifier to the latent space of StyleGAN-XL sauer_stylegan-xl_2022, thereby visualizing the representations learned by the classifier. Building on these visualizations, we propose a pipeline to automatically and objectively analyze a large number of learned representations in $R$ by evaluating the changes between images caused by perturbations in $R$. We show two applications of how our method can be used to understand learned features in deep learning models. Middle: We systematically 'tune' the activations of all units in $R$ separately to obtain a comprehensive overview across sparsely encoded representations across thousands of units. Right: The linking network can visualize counterfactual examples and our quantification pipeline reveals trajectories that provide insights into learned concepts relevant for the classifier's decision.
  • Figure 2: Visualizing and quantifying learned features in CNNs. A) The generator $G_s$ generates an image $I$ from a given $w \in W$. $I$ is input to the classifier from which the corresponding activation vector $r \in R$ is extracted. Using a set of ($w$, $r$)-pairs, we train a linking network (red arrow) to create a link between the classifier and the GAN. We then perturb the activation pattern $r$ to visualize learned representations in $R$ using the GAN. B) Automatic quantification of semantic concepts. Left (unsupervised): We introduce an unsupervised method to find matching points between images $I_o$ and $I_p$. First, we use PUMP revaud_pump_2022 to compute an affine transformation and align the two images to remove global changes such as translation or zoom (center, top: non-aligned images, bottom: aligned images). We then compute PUMP again to find local changes not accounted for by the affine transform and compute the vector field to visualize the changes. Right (supervised): We compute the segmentation mask for each image separately following tritrong_repurposing_2021. Then, we quantify each semantic label in an image according to different evaluation metrics: area (shown here), luminance, entropy, eccentricity, and angle. For each metric, we compute the change induced by a perturbation in $R$.
  • Figure 3: Feasibility and performance of linking network.A) High similarity between StyleGAN-XL's $W$-space and representation space $R$ in ResNet-50. Across 100 repetitions, 100 examples for five different ImageNet classes are sampled. Left: We fit a k-Means clustering ($k=5$, 20 initializations) on the selected examples and compute the Adjusted Rand Index (ARI) between the predicted clusters and the real class labels. Right: Learned representations in $W$ and $R$ are highly similar, shown here by high average correlations between flattened similarity (correlation) and dissimilarity (Euclidean distance) matrices of the selected examples computed for the two spaces. B) The trained linking network achieves high similarities between generated images ($I$) and images cycled through the linking network and the GAN ($\Tilde{I}$). C) We quantify the performance of the linking network by the MSE between $w$ and $\Tilde{w}$ and by the perceptual image distance MoCov2 chen_improved_2020 between $I$ and $\Tilde{I}$.
  • Figure 4: Automatically revealed abstract concepts encoded in individual units. We tune the activation of individual units in $R$ and visualize the results. We observe abstract concepts to be encoded in single units, such as gender or color (A), and to be stable across different classes, shown here for the cap sizes of different fungi (B). C) Different units encode different concepts that can be visualized with our unsupervised tracking method (vector fields).
  • Figure 5: Overview of features represented by individual units for the 2,048 units in the hidden layer of the ResNet-50 classifier.A) Left: We compute the label sparsity of each quantification metric (here area, see Supplement for other metrics) across all units for 100 test seeds. Different classes exhibit different levels of sparsity. Right: Highly sparse units reveal disentangled representations of concepts such as long legs, larger eyes, or longer ears. B) $R$ is semantically ordered. We encode all changes in area induced by single-unit perturbations into a low-dimensional space using tSNE and color units by the label with the strongest change. Regional overlap between labels indicates interdependent representations of these concepts; as observed for snout, ear, and head, but not for legs and body. C) Hierarchical clustering of the label vectors reveals clusters representing disentangled concepts (cluster 9) as well as combinations of previously observed overlapping concepts (clusters 18 and 65).
  • ...and 2 more figures