Table of Contents
Fetching ...

Explicit Inductive Bias for Transfer Learning with Convolutional Networks

Xuhong Li, Yves Grandvalet, Franck Davoine

TL;DR

The paper tackles the problem of preserving useful knowledge from a pre-trained CNN during transfer learning by introducing explicit inductive biases that anchor fine-tuning to the initial weights. It develops a family of regularizers, notably L2-SP, which penalize deviations from the pre-trained parameters, and systematically evaluates them against standard L2, L1, and Group-Lasso variants across diverse source/target pairs. Empirically, L2-SP consistently improves target-task accuracy over conventional fine-tuning, with larger gains in low-data regimes, and requires minimal computational overhead; Fisher-based variants show limited additional benefit in this context. The work proposes L2-SP as a robust, simple baseline for transfer learning and provides theoretical and empirical insights into why preserving proximity to the pre-trained solution helps retain useful source-task representations, with evidence also extending to segmentation tasks like Cityscapes.

Abstract

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

Explicit Inductive Bias for Transfer Learning with Convolutional Networks

TL;DR

The paper tackles the problem of preserving useful knowledge from a pre-trained CNN during transfer learning by introducing explicit inductive biases that anchor fine-tuning to the initial weights. It develops a family of regularizers, notably L2-SP, which penalize deviations from the pre-trained parameters, and systematically evaluates them against standard L2, L1, and Group-Lasso variants across diverse source/target pairs. Empirically, L2-SP consistently improves target-task accuracy over conventional fine-tuning, with larger gains in low-data regimes, and requires minimal computational overhead; Fisher-based variants show limited additional benefit in this context. The work proposes L2-SP as a robust, simple baseline for transfer learning and provides theoretical and empirical insights into why preserving proximity to the pre-trained solution helps retain useful source-task representations, with evidence also extending to segmentation tasks like Cityscapes.

Abstract

In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.

Paper Structure

This paper contains 30 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Classification accuracy (in %) on Stanford Dogs 120 for $L^2$-SP, according to the two regularization hyperparameters $\alpha$ and $\beta$ respectively applied to the layers inherited from the source task and the last classification layer (see Equation \ref{['eq:L2-SP-full']}).
  • Figure 2: Classification accuracies (in %) of the tested fine-tuning approaches on the four target databases, using ImageNet (dark blue dots) or Places 365 (light red dots) as source databases. MIT Indoor 67 is more similar to Places 365 than to ImageNet; Stanford Dogs 120 and Caltech 256 are more similar to ImageNet than to Places 365.
  • Figure 3: Classification accuracies (in %) of fine-tuning with $L^2$ and $L^2$-SP on Stanford Dogs 120 (top) and Caltech 256--30 (bottom) when freezing the first layers of ResNet-101. The dashed lines represent the accuracies reported in Table \ref{['table:results']}, where no layers are frozen. ResNet-101 begins with one convolutional layer, then stacks 3-layer blocks. The three layers in one block are either frozen or trained altogether.
  • Figure 4: $R^2$ coefficients of determination with $L^2$ and $L^2$-SP regularizations for Stanford Dogs 120. Each boxplot summarizes the distribution of the $R^2$ coefficients of the activations after fine-tuning with respect to the activations of the pre-trained network, for all the units in one layer. ResNet-101 begins with one convolutional layer, then stacks 3-layer blocks. For legibility, we only display here the $R^2$ at the first layer and at the outputs of some 3-layer blocks.