Table of Contents
Fetching ...

Measures of Information Reflect Memorization Patterns

Rachit Bansal, Danish Pruthi, Yonatan Belinkov

TL;DR

The paper tackles generalization failures in neural networks caused by heuristic and example-level memorization. It introduces entropy $H$ and mutual information $I$ over neuron activations to quantify intra- and inter-neuron diversity as intrinsic indicators of memorization. Across semi-synthetic and natural NLP/vision tasks, networks displaying heuristic memorization show low $H$ and high $I$, while example-level memorization yields high $H$ and low $I$, with consistent patterns aiding model selection. This intrinsic, data-efficient framework provides a scalable tool for diagnosing memorization and guiding model selection without relying on curated OOD test sets.

Abstract

Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.

Measures of Information Reflect Memorization Patterns

TL;DR

The paper tackles generalization failures in neural networks caused by heuristic and example-level memorization. It introduces entropy and mutual information over neuron activations to quantify intra- and inter-neuron diversity as intrinsic indicators of memorization. Across semi-synthetic and natural NLP/vision tasks, networks displaying heuristic memorization show low and high , while example-level memorization yields high and low , with consistent patterns aiding model selection. This intrinsic, data-efficient framework provides a scalable tool for diagnosing memorization and guiding model selection without relying on curated OOD test sets.

Abstract

Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.
Paper Structure (45 sections, 3 equations, 10 figures, 7 tables, 3 algorithms)

This paper contains 45 sections, 3 equations, 10 figures, 7 tables, 3 algorithms.

Figures (10)

  • Figure 1: (a) A toy setup of separating concentric circles; (b) An additional feature spuriously simplifies the task, inciting heuristic memorization; (c) Shuffled target labels induce example-level memorization; (d) Neuron activations for a two-layered feed-forward network trained for the base task in (a); (e) Activation patterns for the network reflect low intra-neuron and inter-neuron diversity when trained on (b); (f) High intra-neuron and inter-neuron diversity is seen when the network is trained on (c); (g) Entropy acts as a proxy to intra-neuron diversity; (h) Mutual Information acts as a proxy to inter-neuron diversity. Distinguishable patterns for the three networks are seen in (g) and (h).
  • Figure 2: The relation between entropy of neural activations and heuristic memorization. For both the setups, networks trained on higher $\alpha$ show higher heuristic memorization (as depicted by the dipping model accuracy line), accompanied with lower entropy values.
  • Figure 3: Distribution of mutual information (MI) of pairs of neurons for networks with varying heuristic memorization. For both settings, networks trained on training sets with larger amounts of spurious correlations ($\uparrow \alpha$) exhibit higher mutual information across their neuron pairs.
  • Figure 4: Distributions of entropy and MI across final layer activations of RoBERTa-base differentiate networks fine-tuned on original and de-biasing sets for Bias-in-Bios. Color of boxes and Gaussian plots corresponds to extractability of gender information in model representations as estimated through MDL probing voita2020mdl ---lighter colors indicate lower extractability (less bias).
  • Figure 5: Entropy and MI for ResNet-18 on the NICO$^{++}$ dataset. The two training sets---balanced and unbalanced---result into models that vary in their generalization to contextual features beyond on what they were trained on. This distinction is reflected in the information measurements.
  • ...and 5 more figures