Exploiting Data Hierarchy as a New Modality for Contrastive Learning

Arjun Bhalla; Daniel Levenson; Jan Bernhard; Anton Abilov

Exploiting Data Hierarchy as a New Modality for Contrastive Learning

Arjun Bhalla, Daniel Levenson, Jan Bernhard, Anton Abilov

TL;DR

This paper investigates whether hierarchical structure can serve as a modality for weakly-supervised representation learning, using the WikiScenes cathedral dataset to test hierarchical contrastive pre-training. It introduces a hierarchical triplet loss with level-dependent margins and replay to exploit data organization, reporting both quantitative improvements in downstream classification and qualitative evidence from latent-space visualizations. The proposed method achieves competitive downstream performance and reveals clearer semantic separation in the latent space compared to baselines, demonstrating the value of structure-aware learning for structured datasets. The work highlights a new modality for self-supervised learning and suggests pathways to extend hierarchical training to other structured domains and semantic hierarchies.

Abstract

This work investigates how hierarchically structured data can help neural networks learn conceptual representations of cathedrals. The underlying WikiScenes dataset provides a spatially organized hierarchical structure of cathedral components. We propose a novel hierarchical contrastive training approach that leverages a triplet margin loss to represent the data's spatial hierarchy in the encoder's latent space. As such, the proposed approach investigates if the dataset structure provides valuable information for self-supervised learning. We apply t-SNE to visualize the resultant latent space and evaluate the proposed approach by comparing it with other dataset-specific contrastive learning methods using a common downstream classification task. The proposed method outperforms the comparable weakly-supervised and baseline methods. Our findings suggest that dataset structure is a valuable modality for weakly-supervised learning.

Exploiting Data Hierarchy as a New Modality for Contrastive Learning

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 5 figures, 1 table)

This paper contains 21 sections, 2 equations, 5 figures, 1 table.

Introduction
Related Work
Dataset
WikiScenes Dataset
Contrastive Training Dataset Construction
Method
Hierarchical Training Algorithm
Training Data Sampling
Contrastive Loss
Level-Specific Triplet Margin
Model Selection
Evaluation
Quantitative Evaluation
Qualitative Evaluation
Results and Discussion
...and 6 more sections

Figures (5)

Figure 1: Images of the Berliner Dom used to demonstrate the hierarchical structure of the WikiScenes dataset.
Figure 2: Visualizations of node structure underlying the hierarchical learning procedure.
Figure 3: t-SNE plots of model feature maps for an unseen cathedral's first level
Figure 4: t-SNE plots of the latent space with respect to the downstream classification labels for an unseen cathedral.
Figure 5: Each star marker represents a trained encoder model evaluated on the classification task. Relative $mAP*$ reflects the relative regression compared to the best classification performance of all models in the plot. For example, a Relative $mAP*$ of 0.10 is 90% worse than the best performing model.

Exploiting Data Hierarchy as a New Modality for Contrastive Learning

TL;DR

Abstract

Exploiting Data Hierarchy as a New Modality for Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)