Table of Contents
Fetching ...

Convergent Learning: Do different neural networks learn the same representations?

Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft

TL;DR

This paper investigates whether independently trained deep neural networks converge on similar internal representations, a phenomenon termed convergent learning. It introduces three alignment methods—one-to-one neuron matching via correlation or mutual information, sparse few-to-one mappings using LASSO, and many-to-many mappings via spectral clustering—to compare feature representations across networks trained on the same task. The study finds that some features are consistently learned across networks, while others are network-specific; early layers show more convergence than intermediate ones, and subspaces are shared even when basis vectors differ. The findings imply that neural representations are partly local and partly distributed, with shared subspaces offering potential for improved ensemble methods and model compilation. These insights lay groundwork for targeted model compression, diverse ensemble formation, and cross-architecture analyses.

Abstract

Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investigate the extent to which neural networks exhibit what we call convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. We propose a specific method of probing representations: training multiple networks and then comparing and contrasting their individual, learned representations at the level of neurons or groups of neurons. We begin research into this question using three techniques to approximately align different neural networks on a feature level: a bipartite matching approach that makes one-to-one assignments between neurons, a sparse prediction approach that finds one-to-many mappings, and a spectral clustering approach that finds many-to-many mappings. This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the representation codes show evidence of being a mix between a local code and slightly, but not fully, distributed codes across multiple units.

Convergent Learning: Do different neural networks learn the same representations?

TL;DR

This paper investigates whether independently trained deep neural networks converge on similar internal representations, a phenomenon termed convergent learning. It introduces three alignment methods—one-to-one neuron matching via correlation or mutual information, sparse few-to-one mappings using LASSO, and many-to-many mappings via spectral clustering—to compare feature representations across networks trained on the same task. The study finds that some features are consistently learned across networks, while others are network-specific; early layers show more convergence than intermediate ones, and subspaces are shared even when basis vectors differ. The findings imply that neural representations are partly local and partly distributed, with shared subspaces offering potential for improved ensemble methods and model compilation. These insights lay groundwork for targeted model compression, diverse ensemble formation, and cross-architecture analyses.

Abstract

Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investigate the extent to which neural networks exhibit what we call convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. We propose a specific method of probing representations: training multiple networks and then comparing and contrasting their individual, learned representations at the level of neurons or groups of neurons. We begin research into this question using three techniques to approximately align different neural networks on a feature level: a bipartite matching approach that makes one-to-one assignments between neurons, a sparse prediction approach that finds one-to-many mappings, and a spectral clustering approach that finds many-to-many mappings. This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the representation codes show evidence of being a mix between a local code and slightly, but not fully, distributed codes across multiple units.

Paper Structure

This paper contains 17 sections, 4 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Correlation matrices for the $\mathsf{conv1}\xspace$ layer, displayed as images with minimum value at black and maximum at white. (a,b) Within-net correlation matrices for $\mathsf{Net1}\xspace$ and $\mathsf{Net2}\xspace$, respectively. (c) Between-net correlation for $\mathsf{Net1}\xspace$ vs. $\mathsf{Net2}\xspace$. (d) Between-net correlation for $\mathsf{Net1}\xspace$ vs. a version of $\mathsf{Net2}\xspace$ that has been permuted to approximate $\mathsf{Net1}\xspace$'s feature order. The partially white diagonal of this final matrix shows the extent to which the alignment is successful; see Figure \ref{['fig:match_vs_max_conv1']} for a plot of the values along this diagonal and further discussion.
  • Figure 2: With assignments chosen by semi-matching, the eight best (highest correlation, left) and eight worst (lowest correlation, right) matched features between $\mathsf{Net1}\xspace$ and $\mathsf{Net2}\xspace$ for the $\mathsf{conv1}\xspace$ -- $\mathsf{conv3}\xspace$ layers. For all layers visualized, (1) the most correlated filters are near perfect matches, showing that many similar features are learned by independently trained neural networks, and (2) the least correlated features show that many features are learned by one network and are not learned by the other network, at least not by a single neuron in the other network. The results for the $\mathsf{conv4}\xspace$ and $\mathsf{conv5}\xspace$ layers can be found in the Supplementary Material (see Figure \ref{['fig:match_ims_top_bot_conv4_conv5']}).
  • Figure 3: Correlations between paired $\mathsf{conv1}\xspace$ units in $\mathsf{Net1}\xspace$ and $\mathsf{Net2}\xspace$. Pairings are made via semi-matching (light green), which allows the same unit in $\mathsf{Net2}\xspace$ to be matched with multiple units in $\mathsf{Net1}\xspace$, or matching (dark green), which forces a unique $\mathsf{Net2}\xspace$ neuron to be paired with each $\mathsf{Net1}\xspace$ neuron. Units are sorted by their semi-matching values. See text for discussion.
  • Figure 4: Average correlations between paired $\mathsf{conv1}\xspace$ units in $\mathsf{Net1}\xspace$ and $\mathsf{Net2}\xspace$. Both semi-matching (light green) and matching (dark green) methods suggest that features learned in different networks are most convergent on $\mathsf{conv1}\xspace$ and least convergent on $\mathsf{conv4}\xspace$.
  • Figure 5: A visualization of the network-to-network sparse "mapping layers" (green squares). The layers are trained independently of each other and with an L1 weight penalty to encourage sparse weights.
  • ...and 14 more figures