Table of Contents
Fetching ...

Naturally Computed Scale Invariance in the Residual Stream of ResNet18

André Longon

TL;DR

This paper investigates how scale invariance emerges in visual object recognition by examining the residual stream of ResNet18. Using center-neuron feature visualizations and a defined scale transform $S$, the authors identify scale-invariant channels that appear when a block input's smaller-scale copy combines with the block pre-sum's larger-scale copy to form the Post output. Ablation experiments show removing these channels disproportionately harms scale-robust recognition, providing causal evidence that the residual stream contributes to scale invariance and suggesting bypass connections as a potential mechanism. The work advances mechanistic interpretability across architectures and offers a bridge to neuroscience by hypothesizing how bypass-like pathways may compute invariance.

Abstract

An important capacity in visual object recognition is invariance to image-altering variables which leave the identity of objects unchanged, such as lighting, rotation, and scale. How do neural networks achieve this? Prior mechanistic interpretability research has illuminated some invariance-building circuitry in InceptionV1, but the results are limited and networks with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural component which InceptionV1 lacks. We observe that many convolutional channels in intermediate blocks exhibit scale invariant properties, computed by the element-wise residual summation of scale equivariant representations: the block input's smaller-scale copy with the block pre-sum output's larger-scale copy. Through subsequent ablation experiments, we attempt to causally link these neural properties with scale-robust object recognition behavior. Our tentative findings suggest how the residual stream computes scale invariance and its possible role in behavior. Code is available at: https://github.com/cest-andre/residual-stream-interp

Naturally Computed Scale Invariance in the Residual Stream of ResNet18

TL;DR

This paper investigates how scale invariance emerges in visual object recognition by examining the residual stream of ResNet18. Using center-neuron feature visualizations and a defined scale transform , the authors identify scale-invariant channels that appear when a block input's smaller-scale copy combines with the block pre-sum's larger-scale copy to form the Post output. Ablation experiments show removing these channels disproportionately harms scale-robust recognition, providing causal evidence that the residual stream contributes to scale invariance and suggesting bypass connections as a potential mechanism. The work advances mechanistic interpretability across architectures and offers a bridge to neuroscience by hypothesizing how bypass-like pathways may compute invariance.

Abstract

An important capacity in visual object recognition is invariance to image-altering variables which leave the identity of objects unchanged, such as lighting, rotation, and scale. How do neural networks achieve this? Prior mechanistic interpretability research has illuminated some invariance-building circuitry in InceptionV1, but the results are limited and networks with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural component which InceptionV1 lacks. We observe that many convolutional channels in intermediate blocks exhibit scale invariant properties, computed by the element-wise residual summation of scale equivariant representations: the block input's smaller-scale copy with the block pre-sum output's larger-scale copy. Through subsequent ablation experiments, we attempt to causally link these neural properties with scale-robust object recognition behavior. Our tentative findings suggest how the residual stream computes scale invariance and its possible role in behavior. Code is available at: https://github.com/cest-andre/residual-stream-interp

Paper Structure

This paper contains 15 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Grids of maximally exciting images for exemplary scale invariant channels in block 2.1 (top row) and 3.1 (bottom row). All channels are zero-indexed. Each 3x3 grid is for a single channel across the block's target layers and is organized as follows. Columns from left to right: block channel at input, pre-sum, and post-sum layers. Rows from top to bottom: FZs optimized for the channel's center neuron, FZs for the entire channel, and the top 9 center-neuron activating natural images from the ImageNet validation set 5206848.
  • Figure 2: Scale-robust object recognition degradation from ablating scale invariant channels versus random non-invariant channels in block 2.1 (left) and 3.1 (right). The y-axis shows the mean ratios of top-1 ImageNet validation accuracy 5206848 between the two ablation conditions at a given scale transformation percentage (applied to all images), with black bars showing the standard error across the random trials. The blue line is the ratio when no scale transform is applied. The scale-transformed ratios are below the blue line, meaning scale invariant channel ablation did disproportionately more damage to accuracy during scale-robust object recognition.
  • Figure 3: Grids of maximally exciting images for all remaining scale invariant criteria-passing channels in blocks 1.1, 2.0, and 2.1. All channels are zero-indexed.
  • Figure 4: Grids of maximally exciting images for all remaining scale invariant criteria-passing channels in block 3.1. All channels are zero-indexed.