Table of Contents
Fetching ...

Mechanistic Mode Connectivity

Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

TL;DR

Mechanistic Mode Connectivity investigates how minimizers that rely on different predictive mechanisms relate in loss landscapes. The authors define mechanistic similarity via invariances to input transformations and show that lack of linear connectivity signals mechanistic dissimilarity, with naive fine-tuning potentially failing to change a model's reliance on spurious attributes. They introduce Connectivity-Based Fine-Tuning (CBFT), a sample-efficient method that uses a minimal clean dataset and barrier/invariance losses to alter a model's mechanisms, and validate it on synthetic datasets with spurious cues. The work provides both theoretical results and empirical evidence that mechanistic distinctions modulate connectivity, with practical implications for robust fine-tuning and model editing.

Abstract

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes.

Mechanistic Mode Connectivity

TL;DR

Mechanistic Mode Connectivity investigates how minimizers that rely on different predictive mechanisms relate in loss landscapes. The authors define mechanistic similarity via invariances to input transformations and show that lack of linear connectivity signals mechanistic dissimilarity, with naive fine-tuning potentially failing to change a model's reliance on spurious attributes. They introduce Connectivity-Based Fine-Tuning (CBFT), a sample-efficient method that uses a minimal clean dataset and barrier/invariance losses to alter a model's mechanisms, and validate it on synthetic datasets with spurious cues. The work provides both theoretical results and empirical evidence that mechanistic distinctions modulate connectivity, with practical implications for robust fine-tuning and model editing.

Abstract

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes.
Paper Structure (31 sections, 8 theorems, 14 equations, 31 figures, 5 tables)

This paper contains 31 sections, 8 theorems, 14 equations, 31 figures, 5 tables.

Key Result

Proposition 1

(Exhaustiveness of Unit Interventions.) If $f(.; \theta)$ is invariant to unit interventions $\mathcal{A}_{i}$ and $\mathcal{A}_{j}$, it must be invariant to their composition. Further, lack of invariance to $\mathcal{A}_{i}$ or $\mathcal{A}_{j}$ precludes invariance to their composition.

Figures (31)

  • Figure 1: Mechanistic Lens on Mode connectivity. Consider two sets of parameters that minimize loss using background $\theta_\mathrm{Background}$ and object shape $\theta_\mathrm{Shape}$ as the input attributes for prediction, respectively. Are such mechanistically dissimilar minimizers connected via paths of low loss in the landscape? Does the dissimilarity of these mechanisms affect the simplicity of their connectivity paths? Can we exploit this connectivity to switch between minimizers that use our desired mechanisms?
  • Figure 2: Mechanistic Similarity: We define mechanistic similarity of two models based on how they respond to unit interventions on the data-generating process, i.e., interventions on specific dimensions of the latent vector $z$; e.g., $\mathcal{A}_1$ (shape) and $\mathcal{A}_2$ (background) in the figure. Here, yellow circles represent the prediction of a given model (column) on a counterfactual image (row). Models whose predictions are invariant to the same set of interventions (denoted $\theta_1 \sim \theta_2$) are termed mechanistically similar.
  • Figure 3: Data-Generating Process (left). We augment the natural latents $\{z_{n}\}$ of a data-generating process with a set of synthetic latents $\{z_{s}\}$. The attributes induced in the input by these synthetic latents are called cues. Conditioning (grey, dotted line) the value of a synthetic latent on the target label ($y$), we can induce correlation between its corresponding cue and the desired model output. If the cue is made easily separable, a DNN will preferentially learn mechanisms to use the cue for making its predictions shah2020pitfalls (see also training curves in App. \ref{['app:setup']}). Synthetic Datasets (right). Following the protocol above, we embed synthetic cues in three existing datasets: (1) CIFAR-10 with $3 \times 3$ box cues whose locations depend on the target label; (2) CIFAR-100 with $3 \times 3$ box cues colored according to the first digit of the object label, and located according to the second digit; and (3) Dominoes shah2020pitfalls, where CIFAR-10 images are concatenated with Fashion-MNIST images of the same class. We analyze counterfactual datasets that involve removing the cue (w/o Cue), keeping it (w/ cue), randomizing it (Rand. cue), or randomizing the natural image (denoted Rand. image). These counterfactuals help us ascertain the extent to which a model's prediction relies on natural vs. spurious attributes.
  • Figure 4: Non-Linear Mode Connectivity of Mechanistically Dissimilar Models. We train ResNet-18 models on our synthetic CIFAR-10 datasets with and without box-cues (denoted $\theta_{\text{C}}$ and $\theta_{\text{NC}}$, respectively). We evaluate quadratic and linear connectivity paths; quadratic paths identified using both data with and w/o cues are analyzed. Line colors denote proportion of the training data with synthetic cues. Plot titles denote evaluation data (see Fig. \ref{['fig:dgp']}), including data where either the cue is present (w/ Cue), absent (w/o Cue), randomized (Rand. Cue), or the underlying image is randomized (Rand. Image). As shown, $\theta_{\text{NC}}$ yields the same performance upon randomization of the cue, while the performance of $\theta_{\text{C}}$ decreases substantially; i.e., the two minimizers induce mechanistically dissimilar models. We see: (i) quadratic paths can be easily identified to mode connect mechanistically dissimilar models; (ii) linear paths cannot be identified, even after permutations; and (iii) mechanistic connectivity is unfounded. See App. \ref{['app:smc_results']} for similar results on other settings and loss curves.
  • Figure 5: Analyzing Pre-trained vs. Fine-Tuned Models: Lack of Linear Connectivity implies Mechanistic Dissimilarity. We train VGG-13 and ResNet-18 models on our synthetic CIFAR-10 dataset with box-cues and perform naïve fine-tuning on data without cues for 100 epochs using different initial learning rates (LR) and a step-decay schedule. Corresponding models are denoted $\theta_{\text{C}}$ and $\theta_{\text{FT}}$; line colors denote proportion of dataset with synthetic cues; titles denote evaluation datasets, similar to Fig. \ref{['fig:smc']}. We plot test accuracy as a function of location on the linear paths (after permutation). Using a large learning rate or enforcing perfect correlation between the cue and label induces loss barriers along the linear path, i.e., linear mode connectivity does not hold. Simultaneously, the models respond differently to counterfactuals, i.e, they are mechanistically dissimilar and not connected. For a small/medium learning rate, we notice $\theta_{\text{FT}}$ remains linear mode connectivity $\theta_{\text{C}}$ on data with cues. Simultaneously, we see the corresponding models responding similarly on counterfactuals and are mechanistically similar. See App. \ref{['app:lmc_results']} for similar results on other datasets, models, and loss curves.
  • ...and 26 more figures

Theorems & Definitions (22)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • Definition 4
  • Definition 5
  • Proposition 2
  • Conjecture 1
  • proof
  • Lemma 1
  • ...and 12 more