Feature Identification for Hierarchical Contrastive Learning
Julius Ott, Nastassia Vysotskaya, Huawei Sun, Lorenzo Servadei, Robert Wille
TL;DR
Hierarchical multi-label classification often underutilizes inter-level relationships. The paper introduces two feature-based hierarchical contrastive learning methods, G-HMLC and A-HMLC, which identify level-specific features via a Gaussian Mixture Model that yields hard masks $M_h$ and per-level attention heads that produce soft masks, respectively. Empirical results on CIFAR100 and ModelNet40 show state-of-the-art linear evaluation, with about a 2 percentage point improvement over prior hierarchical methods, and qualitative MNIST analysis confirms better multi-level separation and embedding variance. The work enhances fine-grained, hierarchy-respecting clustering and robustness to hierarchy ordering, and suggests future directions using Dirichlet-process GMMs to automatically infer the number of hierarchy levels.
Abstract
Hierarchical classification is a crucial task in many applications, where objects are organized into multiple levels of categories. However, conventional classification approaches often neglect inherent inter-class relationships at different hierarchy levels, thus missing important supervisory signals. Thus, we propose two novel hierarchical contrastive learning (HMLC) methods. The first, leverages a Gaussian Mixture Model (G-HMLC) and the second uses an attention mechanism to capture hierarchy-specific features (A-HMLC), imitating human processing. Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels, enabling fine-grained clustering across all hierarchy levels. On the competitive CIFAR100 and ModelNet40 datasets, our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy. The effectiveness of our approach is backed by both quantitative and qualitative results, highlighting its potential for applications in computer vision and beyond.
