Table of Contents
Fetching ...

Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification

Yanbiao Ma, Licheng Jiao, Fang Liu, Shuyuan Yang, Xu Liu, Puhua Chen

TL;DR

This work proposes to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class, and proposes feature uncertainty modeling to perturb the tail features by utilizing the geometry of the head class feature distribution.

Abstract

Real-world data are long-tailed, the lack of tail samples leads to a significant limitation in the generalization ability of the model. Although numerous approaches of class re-balancing perform well for moderate class imbalance problems, additional knowledge needs to be introduced to help the tail class recover the underlying true distribution when the observed distribution from a few tail samples does not represent its true distribution properly, thus allowing the model to learn valuable information outside the observed domain. In this work, we propose to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class. Specifically, we first systematically define the geometry of the feature distribution and the similarity measures between the geometries, and discover four phenomena regarding the relationship between the geometries of different feature distributions. Then, based on four phenomena, feature uncertainty representation is proposed to perturb the tail features by utilizing the geometry of the head class feature distribution. It aims to make the perturbed features cover the underlying distribution of the tail class as much as possible, thus improving the model's generalization performance in the test domain. Finally, we design a three-stage training scheme enabling feature uncertainty modeling to be successfully applied. Experiments on CIFAR-10/100-LT, ImageNet-LT, and iNaturalist2018 show that our proposed approach outperforms other similar methods on most metrics. In addition, the experimental phenomena we discovered are able to provide new perspectives and theoretical foundations for subsequent studies.

Geometric Prior Guided Feature Representation Learning for Long-Tailed Classification

TL;DR

This work proposes to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class, and proposes feature uncertainty modeling to perturb the tail features by utilizing the geometry of the head class feature distribution.

Abstract

Real-world data are long-tailed, the lack of tail samples leads to a significant limitation in the generalization ability of the model. Although numerous approaches of class re-balancing perform well for moderate class imbalance problems, additional knowledge needs to be introduced to help the tail class recover the underlying true distribution when the observed distribution from a few tail samples does not represent its true distribution properly, thus allowing the model to learn valuable information outside the observed domain. In this work, we propose to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class. Specifically, we first systematically define the geometry of the feature distribution and the similarity measures between the geometries, and discover four phenomena regarding the relationship between the geometries of different feature distributions. Then, based on four phenomena, feature uncertainty representation is proposed to perturb the tail features by utilizing the geometry of the head class feature distribution. It aims to make the perturbed features cover the underlying distribution of the tail class as much as possible, thus improving the model's generalization performance in the test domain. Finally, we design a three-stage training scheme enabling feature uncertainty modeling to be successfully applied. Experiments on CIFAR-10/100-LT, ImageNet-LT, and iNaturalist2018 show that our proposed approach outperforms other similar methods on most metrics. In addition, the experimental phenomena we discovered are able to provide new perspectives and theoretical foundations for subsequent studies.
Paper Structure (28 sections, 19 equations, 13 figures, 4 tables)

This paper contains 28 sections, 19 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (a) When the samples uniformly cover the true data distribution, the model can learn the correct decision boundaries and can correctly classify unfamiliar samples to be tested. (b) When the samples cover only a portion of the true distribution, unfamiliar samples to be tested are highly likely to be misclassified due to the error in the decision boundary. (c) The direction in which the arrow points is the best direction to expand the sample.
  • Figure 2: The ratio of the sum of the top five eigenvalues to the sum of all eigenvalues after eigendecomposition for the feature embeddings of all classes in the three datasets. The horizontal coordinates are the indexes of the classes, and the specific class names are in Appendix \ref{['secD']}.
  • Figure 3: (a) The horizontal coordinates are the indexes of the classes, and 1 to 9 indicate the classes that are most similar to the class represented by the vertical coordinates to the least similar, respectively. Each element represents the similarity of the geometry between classes. See Appendix \ref{['secD']} for detailed class names. (b) Same as (a). (c) The inner product between all eigenvectors of dog and all eigenvectors of cat in CIFAR-10. The sum of the first five diagonal elements of $M1$ is equal to the value of the element in the first column of the first row in (b). (d) The inner product between all eigenvectors of dog and automobile in CIFAR-10. The third row represents the results of the experiments using ResNet-50 and VGG-16 as backbone networks. The axes as well as the meanings of the values are consistent with (a) and (b).
  • Figure 4: (a) The function curve of Equation (\ref{['equa1']}). It can be observed that as the dimensionality increases, any two random vectors tend to be orthogonal to each other. (b) When two different models are used to extract features of dog separately, the geometry of the two feature distributions is not similar. (c) Cosine similarity between feature centers of classes on CIFAR-10. (d) Cosine similarity between feature centers of classes on CIFAR-10-LT.
  • Figure 5: The left figure shows the number of training and testing samples for each class in OIA-ODIR dataset. The right figure presents a comparison of various methods.
  • ...and 8 more figures

Theorems & Definitions (3)

  • definition thmcounterdefinition: The geometry of data distribution
  • definition thmcounterdefinition: Similarity metric between geometry
  • definition thmcounterdefinition