Table of Contents
Fetching ...

Feature Identification for Hierarchical Contrastive Learning

Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille

TL;DR

Hierarchical multi-label classification often underutilizes inter-level relationships. The paper introduces two feature-based hierarchical contrastive learning methods, G-HMLC and A-HMLC, which identify level-specific features via a Gaussian Mixture Model that yields hard masks $M_h$ and per-level attention heads that produce soft masks, respectively. Empirical results on CIFAR100 and ModelNet40 show state-of-the-art linear evaluation, with about a 2 percentage point improvement over prior hierarchical methods, and qualitative MNIST analysis confirms better multi-level separation and embedding variance. The work enhances fine-grained, hierarchy-respecting clustering and robustness to hierarchy ordering, and suggests future directions using Dirichlet-process GMMs to automatically infer the number of hierarchy levels.

Abstract

Hierarchical classification is a crucial task in many applications, where objects are organized into multiple levels of categories. However, conventional classification approaches often neglect inherent inter-class relationships at different hierarchy levels, thus missing important supervisory signals. Thus, we propose two novel hierarchical contrastive learning (HMLC) methods. The first, leverages a Gaussian Mixture Model (G-HMLC) and the second uses an attention mechanism to capture hierarchy-specific features (A-HMLC), imitating human processing. Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels, enabling fine-grained clustering across all hierarchy levels. On the competitive CIFAR100 and ModelNet40 datasets, our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy. The effectiveness of our approach is backed by both quantitative and qualitative results, highlighting its potential for applications in computer vision and beyond.

Feature Identification for Hierarchical Contrastive Learning

TL;DR

Hierarchical multi-label classification often underutilizes inter-level relationships. The paper introduces two feature-based hierarchical contrastive learning methods, G-HMLC and A-HMLC, which identify level-specific features via a Gaussian Mixture Model that yields hard masks and per-level attention heads that produce soft masks, respectively. Empirical results on CIFAR100 and ModelNet40 show state-of-the-art linear evaluation, with about a 2 percentage point improvement over prior hierarchical methods, and qualitative MNIST analysis confirms better multi-level separation and embedding variance. The work enhances fine-grained, hierarchy-respecting clustering and robustness to hierarchy ordering, and suggests future directions using Dirichlet-process GMMs to automatically infer the number of hierarchy levels.

Abstract

Hierarchical classification is a crucial task in many applications, where objects are organized into multiple levels of categories. However, conventional classification approaches often neglect inherent inter-class relationships at different hierarchy levels, thus missing important supervisory signals. Thus, we propose two novel hierarchical contrastive learning (HMLC) methods. The first, leverages a Gaussian Mixture Model (G-HMLC) and the second uses an attention mechanism to capture hierarchy-specific features (A-HMLC), imitating human processing. Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels, enabling fine-grained clustering across all hierarchy levels. On the competitive CIFAR100 and ModelNet40 datasets, our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy. The effectiveness of our approach is backed by both quantitative and qualitative results, highlighting its potential for applications in computer vision and beyond.

Paper Structure

This paper contains 10 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Hierarchical multi-label contrastive learning (HMLC) setup. The dogs are positive pairs in the first level but negative pairs in the second level. Whereas the cat and elephant are negatives on both levels.
  • Figure 2: Illustration of the G-HMLC and A-HMLC architectures. The projection head $E_{\theta}$ maps the images to embedding vectors. In G-HMLC, a GMM is fitted on this embedding vector to generate a mask for each hierarchy level. This hard masking is suitable for unrelated lower hierarchy classes (furniture-couch $\neq$ furniture-table). For shared features along the hierarchy tree (e.g. dogs), A-HMLC computes soft attention scores. The hierarchical binary or soft masks, both denoted as $M_{i}$ for readability, are then multiplied by the feature vector.
  • Figure 3: Examples of the hierarchical MNIST dataset. The central digit refers to the class, and the image size is scaled from $32\times32$ to $192\times192$ pixels. The subsidiary digit around denotes the category and is placed randomly with a size of $32\times32$.
  • Figure 4: First and second t-SNE components of the hierarchical MNIST test embeddings. The HMCE loss (left column) separates the first level (top row) but misses a clear separation for the second level (bottom row). The proposed G-HMLC loss (right column) separates both levels.