Table of Contents
Fetching ...

Diagnosing Generalization Failures from Representational Geometry Markers

Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang, SueYeon Chung

TL;DR

This work systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings, demonstrating that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

Abstract

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings. In image classification, we find that task-relevant geometric properties of in-distribution (ID) object manifolds consistently forecast poor out-of-distribution (OOD) generalization. In particular, reductions in two geometric measures, effective manifold dimensionality and utility, predict weaker OOD performance across diverse architectures, optimizers, and datasets. We apply this finding to transfer learning with ImageNet-pretrained models. We consistently find that the same geometric patterns predict OOD transfer performance more reliably than ID accuracy. This work demonstrates that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

Diagnosing Generalization Failures from Representational Geometry Markers

TL;DR

This work systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings, demonstrating that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

Abstract

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings. In image classification, we find that task-relevant geometric properties of in-distribution (ID) object manifolds consistently forecast poor out-of-distribution (OOD) generalization. In particular, reductions in two geometric measures, effective manifold dimensionality and utility, predict weaker OOD performance across diverse architectures, optimizers, and datasets. We apply this finding to transfer learning with ImageNet-pretrained models. We consistently find that the same geometric patterns predict OOD transfer performance more reliably than ID accuracy. This work demonstrates that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.
Paper Structure (63 sections, 25 equations, 23 figures, 7 tables)

This paper contains 63 sections, 25 equations, 23 figures, 7 tables.

Figures (23)

  • Figure 1: A diagnostic, system-level paradigm for studying generalization failures in DNNs, with an example on image classification. See \ref{['sec:intro overview']} for an overview.
  • Figure 2: Object manifolds and task-relevant geometric measures.a, Object manifolds are the per-class point clouds in the feature space. b, Critical dimension $N_\textsf{crit}$ quantifies the degree of manifold untangling/separability in an average-case sense via random projection. c, Anchor point distribution gives higher weight to points that are more important for linear classification.d, The degree of manifold separation (quantified by critical number of neurons $N_\textsf{crit}$) is analytically linked to three task-relevant geometric measures: effective dimension $D_{\textsf{eff}}$, radius $R_{\textsf{eff}}$ and utility $\Psi_{\textsf{eff}}$.
  • Figure 3: Prognostic discovery for OOD generalization.a, We consider the image classification problem with an ID dataset and an OOD dataset with disjoint image classes. b, We trained DNNs on the ID dataset and evaluated the OOD performance as linear probe accuracy. c, Conventional performance and statistical measures on the ID dataset are weakly predictive of OOD performance, while some task-relevant geometric measures can robustly predict failures in OOD generalization.
  • Figure 4: All results on models trained on CIFAR-10, showing correlations between markers (x-axis) and OOD performance across a hyperparameter sweep. Numbers indicate Pearson $r$; asterisks denote significance ($^{*}: p\le0.05$; $^{**}: p\le0.01$; $^{***}: p\le0.001$; $^{****}: p\le0.0001$).
  • Figure 5: Predict OOD transfer performance on ImageNet-pretrained models via $D_{\textsf{eff}}$ and $\Psi_{\textsf{eff}}$. For the first block of models, our prognostic indicators predicted that v1 would outperform v2. For the second block of models, our prognostic indicators predicted the other way around.
  • ...and 18 more figures