Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts
Chaitanya Kapoor, Sudhanshu Srivastava, Meenakshi Khosla
TL;DR
This work presents a large-scale audit of representational convergence across dozens of vision models, self-supervised networks, vision transformers, and language models. It compares three alignment families—linear regression, orthogonal Procrustes, and permutation/soft-matching—to quantify how similarly independently trained networks encode information, tracking these alignments across layers, across training time, and under distribution shifts. Key findings include strong early-layer convergence across architectures and metrics, only modest gains from more flexible mappings beyond rotation/reflection, and deeper-layer representations that diverge under out-of-distribution inputs, with parallel patterns observed in CNNs, ViTs, MoCo, and Pythia-based language models. The results illuminate a robust, depth-dependent, and largely input-statistics-driven portrait of convergent learning, with implications for neuroscience-brain modeling, model evaluation under distribution shifts, and architectural design choices that foster or limit representational alignment.
Abstract
Understanding convergent learning -- the degree to which independently trained neural systems -- whether multiple artificial networks or brains and models -- arrive at similar internal representations -- is crucial for both neuroscience and AI. Yet, the literature remains narrow in scope -- typically examining just a handful of models with one dataset, relying on one alignment metric, and evaluating networks at a single post-training checkpoint. We present a large-scale audit of convergent learning, spanning dozens of vision models and thousands of layer-pair comparisons, to close these long-standing gaps. First, we pit three alignment families against one another -- linear regression (affine-invariant), orthogonal Procrustes (rotation-/reflection-invariant), and permutation/soft-matching (unit-order-invariant). We find that orthogonal transformations align representations nearly as effectively as more flexible linear ones, and although permutation scores are lower, they significantly exceed chance, indicating a privileged representational basis. Tracking convergence throughout training further shows that nearly all eventual alignment crystallizes within the first epoch -- well before accuracy plateaus -- indicating it is largely driven by shared input statistics and architectural biases, not by the final task solution. Finally, when models are challenged with a battery of out-of-distribution images, early layers remain tightly aligned, whereas deeper layers diverge in proportion to the distribution shift. These findings fill critical gaps in our understanding of representational convergence, with implications for neuroscience and AI.
