Table of Contents
Fetching ...

On Transfer in Classification: How Well do Subsets of Classes Generalize?

Raphael Baena, Lucas Drumetz, Vincent Gripon

TL;DR

A partial order relation is introduced for class sets, enabling the characterization of class sets that can generalize to others and contributes to better understanding of transfer mechanics and model generalization.

Abstract

In classification, it is usual to observe that models trained on a given set of classes can generalize to previously unseen ones, suggesting the ability to learn beyond the initial task. This ability is often leveraged in the context of transfer learning where a pretrained model can be used to process new classes, with or without fine tuning. Surprisingly, there are a few papers looking at the theoretical roots beyond this phenomenon. In this work, we are interested in laying the foundations of such a theoretical framework for transferability between sets of classes. Namely, we establish a partially ordered set of subsets of classes. This tool allows to represent which subset of classes can generalize to others. In a more practical setting, we explore the ability of our framework to predict which subset of classes can lead to the best performance when testing on all of them. We also explore few-shot learning, where transfer is the golden standard. Our work contributes to better understanding of transfer mechanics and model generalization.

On Transfer in Classification: How Well do Subsets of Classes Generalize?

TL;DR

A partial order relation is introduced for class sets, enabling the characterization of class sets that can generalize to others and contributes to better understanding of transfer mechanics and model generalization.

Abstract

In classification, it is usual to observe that models trained on a given set of classes can generalize to previously unseen ones, suggesting the ability to learn beyond the initial task. This ability is often leveraged in the context of transfer learning where a pretrained model can be used to process new classes, with or without fine tuning. Surprisingly, there are a few papers looking at the theoretical roots beyond this phenomenon. In this work, we are interested in laying the foundations of such a theoretical framework for transferability between sets of classes. Namely, we establish a partially ordered set of subsets of classes. This tool allows to represent which subset of classes can generalize to others. In a more practical setting, we explore the ability of our framework to predict which subset of classes can lead to the best performance when testing on all of them. We also explore few-shot learning, where transfer is the golden standard. Our work contributes to better understanding of transfer mechanics and model generalization.
Paper Structure (28 sections, 2 theorems, 7 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 28 sections, 2 theorems, 7 equations, 16 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Consider the set of all fundamental pairs. Let us remove pairs until this set contains only the ones whose associated models are not equivalent to one another. Then the cardinality of this set is noted $\mathcal{F}(C)$.

Figures (16)

  • Figure 1: Hasse Diagrams illustrating the order relationship of models learned on pairs of classes. Each model is characterized by the pairs of classes it can successfully separate. Arrows from one model $A$ to another $B$ indicate that $A$ is more expressive and can separate all the pairs that $B$ can separate. The colored models are the most expressive and are called fundamental. Models sharing the same colors are considered equivalent as they can separate the same pairs. Uncolored models could be disregarded as the corresponding pairs can be separated with another model. On the left, we consider 4 classes from CIFAR-10, and the diagram shows that 'cat/truck' and 'auto/cat' are equivalent and more expressive than 'auto/dog' and 'dog/truck'. In other words, models 'auto/dog' and 'dog/truck' can be disregarded as they offer no additional separability compared to 'cat/truck' and 'auto/car. On the right, considering 4 classes from FASHION-MNIST, the diagram demonstrates that 'coat/bag' and 'shirt/bag' are equivalent and more expressive than 'coat/trouser' and 'shirt/trouser'.
  • Figure 2: Encoding of 16 classes with a minimal number of bi-partitions (models): $log_2(16) = 4$. Pairs/edges which share the same color correspond to equivalent models.
  • Figure 3: Comparison of the number of pairs separated by each class. The x-axis represents the results from the Pretrained VIT-8, while the y-axis corresponds to the Resnet18 trained from scratch on 6 classes of CIFAR10. The analysis highlights the most promising classes, such as deer, frog, horse, and cat. Interestingly, some of these promising classes are also part of the best subset when training from scratch (automobile, bird, cat, deer, horse, truck).
  • Figure 4: Separability given by a Resnet-50 before and after finetuning on subsets of CIFAR10.
  • Figure 5: Separability given by a VIT-8 before and after finetuning on subsets of CIFAR10.
  • ...and 11 more figures

Theorems & Definitions (17)

  • Definition 1: Model
  • Definition 2: Model associated with a pair of classes
  • Definition 3: Set of separable pairs
  • Definition 4: Ordering models
  • Definition 5: Equivalent models
  • Definition 6: Fundamental pair
  • Definition 7: Fundamental number
  • Theorem 1
  • Theorem 2
  • proof
  • ...and 7 more