Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy
Roman Malashin, Valeria Yachnaya, Alexander Mullin
TL;DR
This work investigates training dynamics of deep classifiers through the lens of class hierarchies, introducing the concept of hypernym bias and a framework to track the evolving feature manifold. It combines a greedy hypernym classification setup, manifold-evolution metrics with WordNet, and an extended neural collapse analysis to compare hypernym and hyponym spaces. Empirical results on ImageNet reveal that hypernym distinctions are learned earlier than hyponyms, with neural-collapse-like patterns emerging in hypernym spaces ahead of the hyponym space, and these dynamics generalize to CIFAR-100 and DBPedia. The study offers mechanistic insights into hierarchical learning in deep networks and motivates data-driven hierarchies and hierarchical training strategies for improved understanding and efficiency.
Abstract
We investigate the training dynamics of deep classifiers by examining how hierarchical relationships between classes evolve during training. Through extensive experiments, we argue that the learning process in classification problems can be understood through the lens of label clustering. Specifically, we observe that networks tend to distinguish higher-level (hypernym) categories in the early stages of training, and learn more specific (hyponym) categories later. We introduce a novel framework to track the evolution of the feature manifold during training, revealing how the hierarchy of class relations emerges and refines across the network layers. Our analysis demonstrates that the learned representations closely align with the semantic structure of the dataset, providing a quantitative description of the clustering process. Notably, we show that in the hypernym label space, certain properties of neural collapse appear earlier than in the hyponym label space, helping to bridge the gap between the initial and terminal phases of learning. We believe our findings offer new insights into the mechanisms driving hierarchical learning in deep networks, paving the way for future advancements in understanding deep learning dynamics.
