Table of Contents
Fetching ...

Invariant Shape Representation Learning For Image Classification

Tonmoy Hossain, Jing Ma, Jundong Li, Miaomiao Zhang

TL;DR

A novel framework that for the first time develops invariant shape representation learning (ISRL) to further strengthen the robustness of image classifiers and develops a new learning paradigm based on invariant risk minimization (IRM) to learn invariant representations of image and shape features across multiple training distributions/environments.

Abstract

Geometric shape features have been widely used as strong predictors for image classification. Nevertheless, most existing classifiers such as deep neural networks (DNNs) directly leverage the statistical correlations between these shape features and target variables. However, these correlations can often be spurious and unstable across different environments (e.g., in different age groups, certain types of brain changes have unstable relations with neurodegenerative disease); hence leading to biased or inaccurate predictions. In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL) to further strengthen the robustness of image classifiers. In contrast to existing approaches that mainly derive features in the image space, our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations. To achieve this goal, we develop a new learning paradigm based on invariant risk minimization (IRM) to learn invariant representations of image and shape features across multiple training distributions/environments. By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions. We validate our method by performing classification tasks on both simulated 2D images, real 3D brain and cine cardiovascular magnetic resonance images (MRIs). Our code is publicly available at https://github.com/tonmoy-hossain/ISRL.

Invariant Shape Representation Learning For Image Classification

TL;DR

A novel framework that for the first time develops invariant shape representation learning (ISRL) to further strengthen the robustness of image classifiers and develops a new learning paradigm based on invariant risk minimization (IRM) to learn invariant representations of image and shape features across multiple training distributions/environments.

Abstract

Geometric shape features have been widely used as strong predictors for image classification. Nevertheless, most existing classifiers such as deep neural networks (DNNs) directly leverage the statistical correlations between these shape features and target variables. However, these correlations can often be spurious and unstable across different environments (e.g., in different age groups, certain types of brain changes have unstable relations with neurodegenerative disease); hence leading to biased or inaccurate predictions. In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL) to further strengthen the robustness of image classifiers. In contrast to existing approaches that mainly derive features in the image space, our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations. To achieve this goal, we develop a new learning paradigm based on invariant risk minimization (IRM) to learn invariant representations of image and shape features across multiple training distributions/environments. By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions. We validate our method by performing classification tasks on both simulated 2D images, real 3D brain and cine cardiovascular magnetic resonance images (MRIs). Our code is publicly available at https://github.com/tonmoy-hossain/ISRL.

Paper Structure

This paper contains 10 sections, 6 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of our proposed network architecture of ISRL. The geometric shape learning ($\mathcal{G}_E , \mathcal{G}_D$) and image network ($\mathcal{I}_E$) is taking images from different environments {$\mathcal{E}_1,\cdots, \mathcal{E}_{tr}$}. ISRL combined features from latent spaces passing it to the classifier $w$. To learn invariant features, we combine geometric shape learning loss with environment-wise risk ($R_{tr}$) along with their gradient ($||\nabla_w R_{tr}||_2^2$).
  • Figure 2: Top to bottom: examples of 2D simulated data vs. 3D brain MRIs vs. 3D video sequence of cardiac MRIs. The training environments of each dataset (color vs. age vs. patient history of congestive heart failure).
  • Figure 3: A comparison of the baselines and ISRL on four different backbones over increased probability of label flipping. ISRL exhibits superior generalization across backbones under increasing label noise, suggesting enhanced invariance in its learned representations.
  • Figure 4: Comparison of disjoint (two-step) learning vs. our joint approach on both 2D simulated data and 3D brain MRIs on different network backbones. Joint optimization of shape learning and invariant classification yields superior performance.
  • Figure 5: A visual comparison of activation maps generated by ERMs and invariant models (IRMs, including ISRL). G: Ground-Truth, P: Prediction.

Theorems & Definitions (1)

  • Definition 3.1