Exploiting Data Hierarchy as a New Modality for Contrastive Learning
Arjun Bhalla, Daniel Levenson, Jan Bernhard, Anton Abilov
TL;DR
This paper investigates whether hierarchical structure can serve as a modality for weakly-supervised representation learning, using the WikiScenes cathedral dataset to test hierarchical contrastive pre-training. It introduces a hierarchical triplet loss with level-dependent margins and replay to exploit data organization, reporting both quantitative improvements in downstream classification and qualitative evidence from latent-space visualizations. The proposed method achieves competitive downstream performance and reveals clearer semantic separation in the latent space compared to baselines, demonstrating the value of structure-aware learning for structured datasets. The work highlights a new modality for self-supervised learning and suggests pathways to extend hierarchical training to other structured domains and semantic hierarchies.
Abstract
This work investigates how hierarchically structured data can help neural networks learn conceptual representations of cathedrals. The underlying WikiScenes dataset provides a spatially organized hierarchical structure of cathedral components. We propose a novel hierarchical contrastive training approach that leverages a triplet margin loss to represent the data's spatial hierarchy in the encoder's latent space. As such, the proposed approach investigates if the dataset structure provides valuable information for self-supervised learning. We apply t-SNE to visualize the resultant latent space and evaluate the proposed approach by comparing it with other dataset-specific contrastive learning methods using a common downstream classification task. The proposed method outperforms the comparable weakly-supervised and baseline methods. Our findings suggest that dataset structure is a valuable modality for weakly-supervised learning.
