Table of Contents
Fetching ...

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer

TL;DR

This paper defines a Consistency Profile and a scalar C-score to quantify per-instance generalization as training data grows, revealing a continuum between densely regular modes and sparse, ambiguous regions in data distributions. It provides an empirical estimation framework, analyzes proxies, and applies the method to MNIST, CIFAR-10/100, and ImageNet, uncovering meaningful structure such as mislabeled or outlier instances at one end and well-supported regular examples at the other. The authors compare distance-based and learning-speed proxies, finding that learning-speed metrics correlate best with the C-score and offer scalable diagnostics. They demonstrate practical uses in data pruning, outlier detection, and studying optimizer dynamics, and release code and precomputed scores to enable broader adoption.

Abstract

Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how individual instances are treated by a model via a consistency score. The score characterizes the expected accuracy for a held-out instance given training sets of varying size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We identify computationally inexpensive proxies to the consistency score using statistics collected during training. We show examples of potential applications to the analysis of deep-learning systems.

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

TL;DR

This paper defines a Consistency Profile and a scalar C-score to quantify per-instance generalization as training data grows, revealing a continuum between densely regular modes and sparse, ambiguous regions in data distributions. It provides an empirical estimation framework, analyzes proxies, and applies the method to MNIST, CIFAR-10/100, and ImageNet, uncovering meaningful structure such as mislabeled or outlier instances at one end and well-supported regular examples at the other. The authors compare distance-based and learning-speed proxies, finding that learning-speed metrics correlate best with the C-score and offer scalable diagnostics. They demonstrate practical uses in data pruning, outlier detection, and studying optimizer dynamics, and release code and precomputed scores to enable broader adoption.

Abstract

Humans are accustomed to environments that contain both regularities and exceptions. For example, at most gas stations, one pays prior to pumping, but the occasional rural station does not accept payment in advance. Likewise, deep neural networks can generalize across instances that share common patterns or structures, yet have the capacity to memorize rare or irregular forms. We analyze how individual instances are treated by a model via a consistency score. The score characterizes the expected accuracy for a held-out instance given training sets of varying size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and strongly regular examples at the other end. We identify computationally inexpensive proxies to the consistency score using statistics collected during training. We show examples of potential applications to the analysis of deep-learning systems.

Paper Structure

This paper contains 16 sections, 5 equations, 20 figures, 2 tables, 1 algorithm.

Figures (20)

  • Figure 1: Regularities and exceptions in a binary chairs vs non-chairs problem. (b) illustration of consistency profiles. (c) Regularities (high C-scores) and exceptions (low C-scores) in ImageNet.
  • Figure 2: Consistency profiles of training examples. Each curve in the figure corresponds to the average profile of a set of examples, partitioned according to the area under the profile curve of each example.
  • Figure 3: (a) Top ranked examples in CIFAR-10 and CIFAR-100. (b) Bottom ranked examples with annotations.
  • Figure 4: (a) Histogram of $\hat{C}_{\hat{\mathcal{D}},n}$ for each subset ratio on CIFAR-10. (b) Histogram of the C-score $\hat{C}_{\hat{\mathcal{D}}}$ averaged over all subset ratios on 3 different data sets.
  • Figure 5: (a) Rank correlation between integral C-score and the C-score for a particular subset ratio, $s$. The peak of each curve indicates the training set size that best reveals generalization of the model. (b) Joint distribution of C-score per-class means and standard deviations on ImageNet. Samples from representative classes ($\star$'s) are shown in Figure \ref{['fig:imagenet-per-class-egs']}.
  • ...and 15 more figures