Table of Contents
Fetching ...

Instilling Inductive Biases with Subnetworks

Enyan Zhang, Michael A. Lepori, Ellie Pavlick

TL;DR

The paper addresses how to deliberately steer neural networks toward preferred solutions by instilling inductive biases through a mechanistic method called Subtask Induction. This approach localizes a functional subnetwork that implements a subtask within a trained model, and transfers those weights to a randomly initialized network while freezing the subnetwork, thereby biasing learning toward solutions that reuse that subtask. Across arithmetic grokking tasks and vision benchmarks, Subtask Induction achieves data-efficient generalization and induces a human-like shape bias, even with limited downstream data; it also yields robustness improvements on cue-conflict tests. The work demonstrates a flexible, cheaper alternative to architectural design or heavy meta-learning for bias control and highlights a path toward more mechanistic interpretability in neural networks, with release of data variants and code to support reproducibility.

Abstract

Despite the recent success of artificial neural networks on a variety of tasks, we have little knowledge or control over the exact solutions these models implement. Instilling inductive biases -- preferences for some solutions over others -- into these models is one promising path toward understanding and controlling their behavior. Much work has been done to study the inherent inductive biases of models and instill different inductive biases through hand-designed architectures or carefully curated training regimens. In this work, we explore a more mechanistic approach: Subtask Induction. Our method discovers a functional subnetwork that implements a particular subtask within a trained model and uses it to instill inductive biases towards solutions utilizing that subtask. Subtask Induction is flexible and efficient, and we demonstrate its effectiveness with two experiments. First, we show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution to a modular arithmetic task. Second, we demonstrate that Subtask Induction successfully induces a human-like shape bias while increasing data efficiency for convolutional and transformer-based image classification models.

Instilling Inductive Biases with Subnetworks

TL;DR

The paper addresses how to deliberately steer neural networks toward preferred solutions by instilling inductive biases through a mechanistic method called Subtask Induction. This approach localizes a functional subnetwork that implements a subtask within a trained model, and transfers those weights to a randomly initialized network while freezing the subnetwork, thereby biasing learning toward solutions that reuse that subtask. Across arithmetic grokking tasks and vision benchmarks, Subtask Induction achieves data-efficient generalization and induces a human-like shape bias, even with limited downstream data; it also yields robustness improvements on cue-conflict tests. The work demonstrates a flexible, cheaper alternative to architectural design or heavy meta-learning for bias control and highlights a path toward more mechanistic interpretability in neural networks, with release of data variants and code to support reproducibility.

Abstract

Despite the recent success of artificial neural networks on a variety of tasks, we have little knowledge or control over the exact solutions these models implement. Instilling inductive biases -- preferences for some solutions over others -- into these models is one promising path toward understanding and controlling their behavior. Much work has been done to study the inherent inductive biases of models and instill different inductive biases through hand-designed architectures or carefully curated training regimens. In this work, we explore a more mechanistic approach: Subtask Induction. Our method discovers a functional subnetwork that implements a particular subtask within a trained model and uses it to instill inductive biases towards solutions utilizing that subtask. Subtask Induction is flexible and efficient, and we demonstrate its effectiveness with two experiments. First, we show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution to a modular arithmetic task. Second, we demonstrate that Subtask Induction successfully induces a human-like shape bias while increasing data efficiency for convolutional and transformer-based image classification models.
Paper Structure (35 sections, 3 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 3 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Subtask Induction localizes a subnetwork that implements a certain subtask in a trained neural network and transfers it to a randomly initialized model, thereby instilling an inductive bias towards solutions utilizing the specific subtask. The figure above illustrates the 3 stages of Subtask Induction in our experiments: we first train for a binary weight-level mask representing the subnetwork for a specific subtask through subnetwork discovery, then perform subnetwork transfer by copying the subnetwork weights to a newly initialized model and keep it frozen while optimizing the re-initialized weights. We demonstrate through two experiments that transferring subnetworks effectively and reliably instills desired inductive biases.
  • Figure 2: Graphical illustration of our experimental setup. We setup two tasks such that $T_1 \vcentcolon= S_1 \otimes S_2$, $T_2 \vcentcolon= S_1 \otimes S_3$, where "$\otimes$" stands for some combination of subtasks $S_n$. Note that the subtask $S_1$ is shared between $T_1$ and $T_2$. We train a model on $T_1$, then perform Subtask Induction by localizing and transferring the shared subtask $S_1$ to a new model trained on $T_2$. We find that transferring the subnetwork improves the model's ability to learn $T_2$ significantly compared to various controls.
  • Figure 3: Test accuracy vs number of disambiguation training samples. Left: average over all model configurations (GPT-2, 2 to 12 layers), right: One configuration (GPT-2, 12 layers) with standard deviation across 5 runs. The horizontal axis is in log scale. Trials shown in Figure include Subtask Induction compared against 3 controls: randomly initialized model, transferring randomly sampled subnetworks, and transferring the entire model trained on $T_1$. Despite transferring less than 10% of all parameters, Subtask Induction yields comparable and often higher accuracy compared to transferring the entire model and boosts data efficiency significantly compared to random controls.
  • Figure 4: Qualitative evaluation of Mean-pooled ImageNet. Semantic segmentation followed by mean pooling retains most shape information in a naturalistic way while erasing local texture. origin=c]180 Correct labels: elephant, knife, bottle, airplane, bird, dog.
  • Figure 5: Training dynamics Comparison, Subtask Induction and training from scratch for ResNet18 and ViT. Upper: evaluation accuracy on original ImageNet images, lower: evaluation accuracy on Mean-Pooled Imagenet. Models initialized with Subtask Induction reach higher accuracies with fewer optimization steps and retain a much higher accuracy on Mean-pooled ImageNet.
  • ...and 7 more figures