Understanding the Role of Invariance in Transfer Learning

Till Speicher; Vedant Nanda; Krishna P. Gummadi

Understanding the Role of Invariance in Transfer Learning

Till Speicher, Vedant Nanda, Krishna P. Gummadi

TL;DR

Explores the role of representational invariance in transfer learning, introducing Transforms-2D synthetic datasets to control input transformations. It shows that matching the target task's invariances is often as important as, or more than, factors like training data size, architecture, or pretraining class identity, and that inappropriate invariances can harm transfer. It demonstrates that invariances learned during pretraining can transfer across distribution shifts, while mismatches between training and target invariances degrade performance unless the pretraining set is a superset of the target invariances. The findings guide pretraining data curation and augmentation strategies and highlight potential security implications where adversaries could manipulate invariances to degrade transfer.

Abstract

Transfer learning is a powerful technique for knowledge-sharing between different tasks. Recent work has found that the representations of models with certain invariances, such as to adversarial input perturbations, achieve higher performance on downstream tasks. These findings suggest that invariance may be an important property in the context of transfer learning. However, the relationship of invariance with transfer performance is not fully understood yet and a number of questions remain. For instance, how important is invariance compared to other factors of the pretraining task? How transferable is learned invariance? In this work, we systematically investigate the importance of representational invariance for transfer learning, as well as how it interacts with other parameters during pretraining. To do so, we introduce a family of synthetic datasets that allow us to precisely control factors of variation both in training and test data. Using these datasets, we a) show that for learning representations with high transfer performance, invariance to the right transformations is as, or often more, important than most other factors such as the number of training samples, the model architecture and the identity of the pretraining classes, b) show conditions under which invariance can harm the ability to transfer representations and c) explore how transferable invariance is between tasks. The code is available at \url{https://github.com/tillspeicher/representation-invariance-transfer}.

Understanding the Role of Invariance in Transfer Learning

TL;DR

Abstract

Paper Structure (33 sections, 1 equation, 18 figures, 5 tables)

This paper contains 33 sections, 1 equation, 18 figures, 5 tables.

Introduction
Related Work
Controlling and Evaluating Invariance in Representations
Terminology
Constructing Invariant Representations
Controlling Data Transformations via Synthetic Data
Measuring Invariance in Representations
How Important is Representational Invariance for Transfer Learning?
How Important is Invariance Compared to Other Factors?
Can Invariance be Exploited to Harm Transfer Performance?
How Transferable is Invariance?
Invariance Transfer Under Distribution Shift
Invariance Mismatch between Training and Target Tasks
Conclusion
Dataset Details
...and 18 more sections

Figures (18)

Figure 1: [Transforms-2D examples.] Example images sampled from the Transforms-2D dataset. Each row shows a different transformation being applied to one of the object prototypes.
Figure 2: [Impact of invariance vs other factors on transfer performance in Transforms-2D.] Training (dotted lines) and transfer performance (solid lines) for models trained with different factors of variation and different invariances on the Transforms-2D dataset. Models trained to be invariant to the same transformations as the target tasks (blue) transfer significantly better than models trained to be invariant to different transformations (orange). This effect is very strong compared to the effect of other factors, such as the number of training samples, the model architecture or the relationship between the training and target classes. The reported numbers are aggregated over 10 runs.
Figure 3: [Difference in transfer performance due to invariance vs other factors.] We compare the differences in transfer performance caused by representational invariance with the differences caused by changes to other factors on the Transforms-2D and the CIFAR-10 and CIFAR-100 datasets (with data augmentations). Orange (blue) bars show the span of transfer performance for different-transformation models $g_d$ (same-transformation models $g_s$), for each comparison factor (class relationship, architecture, number of samples). Black bars show the difference between transfer performance means across factor values for $g_d$ and $g_s$, i.e. the difference in performance caused by having the same vs different invariances as the target task. Across all datasets, the difference in transfer performance due to representational invariance is comparable and often larger than the difference due to varying the other factors.
Figure 4: [ResNet-18 models trained on nested sets of transformations and evaluated on datasets with super- and subsets of those transformations.] Models trained on data with the same set or a superset of transformations as the target dataset consistently achieve almost $100\%$ accuracy. However, models trained with only a subset of the transformations show considerably lower performance that decreases the smaller the subset of training transformations is compared to the target task. The results show that learning a superset of required invariances does not harm transfer performance but that missing required invariances degrades transfer performance.
Figure 5: [Transforms-2D transformations.] Categories, transformation types, transformation parameters and samples generated using the transformations for the Transforms-2D dataset.
...and 13 more figures

Understanding the Role of Invariance in Transfer Learning

TL;DR

Abstract

Understanding the Role of Invariance in Transfer Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (18)