Table of Contents
Fetching ...

Uncovering Branch specialization in InceptionV1 using k sparse autoencoders

Matthew Bozoukov

TL;DR

The paper tackles the problem of understanding branch specialization and polysemantic neurons in the later layers of InceptionV1. It introduces $k$-sparse autoencoders with tied weights and a TopK activation ($k=32$) to reduce dead features, applying them across the mixed4a-4e 5x5 branches and selected 1x1 branches, with a latent expansion factor of $16$ and reconstruction loss. Through circuit analysis and UMAP visualizations, the approach reveals that branch specialization emerges consistently across layers, with 5x5 branches encoding animal-related features and cross-layer similarities observed for features localized to the same convolution size. These findings advance mechanistic interpretability of CNNs and provide a foundation for cross-layer feature localization visualizations and potential automated interpretability tools.

Abstract

Sparse Autoencoders (SAEs) have shown to find interpretable features in neural networks from polysemantic neurons caused by superposition. Previous work has shown SAEs are an effective tool to extract interpretable features from the early layers of InceptionV1. Since then, there have been many improvements to SAEs but branch specialization is still an enigma in the later layers of InceptionV1. We show various examples of branch specialization occuring in each layer of the mixed4a-4e branch, in the 5x5 branch and in one 1x1 branch. We also provide evidence to claim that branch specialization seems to be consistent across layers, similar features across the model will be localized in the same convolution size branches in their respective layer.

Uncovering Branch specialization in InceptionV1 using k sparse autoencoders

TL;DR

The paper tackles the problem of understanding branch specialization and polysemantic neurons in the later layers of InceptionV1. It introduces -sparse autoencoders with tied weights and a TopK activation () to reduce dead features, applying them across the mixed4a-4e 5x5 branches and selected 1x1 branches, with a latent expansion factor of and reconstruction loss. Through circuit analysis and UMAP visualizations, the approach reveals that branch specialization emerges consistently across layers, with 5x5 branches encoding animal-related features and cross-layer similarities observed for features localized to the same convolution size. These findings advance mechanistic interpretability of CNNs and provide a foundation for cross-layer feature localization visualizations and potential automated interpretability tools.

Abstract

Sparse Autoencoders (SAEs) have shown to find interpretable features in neural networks from polysemantic neurons caused by superposition. Previous work has shown SAEs are an effective tool to extract interpretable features from the early layers of InceptionV1. Since then, there have been many improvements to SAEs but branch specialization is still an enigma in the later layers of InceptionV1. We show various examples of branch specialization occuring in each layer of the mixed4a-4e branch, in the 5x5 branch and in one 1x1 branch. We also provide evidence to claim that branch specialization seems to be consistent across layers, similar features across the model will be localized in the same convolution size branches in their respective layer.

Paper Structure

This paper contains 11 sections, 2 equations, 10 figures.

Figures (10)

  • Figure 1: This figure shows the various branch specializations across the layers mixed4a-mixed4e. We observe that the 5x5 branch of each layer (besides the mixed4a layer) are something animal related. The mixed4b layer primarily focuses on specific orientations and poses of animals. Mixed4c is kind of a mixed bag of features, some orientation/poses, some general animal features. Mixed4d and 4e(not depicted) seem to be species specific features.
  • Figure 2: This circuit is an example of how the 5x5 branches of each layer seem to have circuits connecting one another. The mixed4b features are localized to the 5x5 branch primarily and are of animals orientated facing left, and facing right. The feature in layer Mixed4c that they both produce is a feature that detects faces of animals. Again, this feature has the largest neuron contribution from neurons in the 5x5 branch. The last part of the circuit shows the last feature spreading out into a feature that detects dogs with fluffy white fur, and two features that seem to detect dog legs. These features have the largest neuron contribution from neurons in the Mixed4d 5x5 branch.
  • Figure 3: A UMap projection of the decoder vectors of Mixed4b 5x5. Around all the features the animal specific features seem to form a manifold.
  • Figure 4: A UMap projection of the decoder vectors of Mixed4c 5x5.Around all the features the animal specific features seem to form a manifold.
  • Figure 5: How much a learned feature from the sparse autoencoder trained on all of mixed4d is represented by the neurons in branch 1x1. We can see that color features generally are represented in a large portion by the neruons in the 1x1 branch
  • ...and 5 more figures