Table of Contents
Fetching ...

Laplace Sample Information: Data Informativeness Through a Bayesian Lens

Johannes Kaiser, Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

TL;DR

The paper addresses the challenge of quantifying per-sample informativeness in deep learning. It introduces Laplace Sample Information (LSI), which uses a Laplace-approximated posterior to compute a KL divergence between the full-data and leave-one-out parameter distributions, yielding a per-sample informativeness score. Empirically, LSI orders samples by typicality, detects mislabeled data, reveals class- and dataset-level informativeness patterns, and transfers well from probe models to larger architectures, all while remaining computationally efficient. This approach offers a principled, scalable tool for data-centric ML tasks, enabling more efficient training, dataset auditing, and deeper insights into learning dynamics across modalities and settings.

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and unsupervised settings. Moreover, we show that LSI can be computed efficiently through probes and transfers well to the training of large models.

Laplace Sample Information: Data Informativeness Through a Bayesian Lens

TL;DR

The paper addresses the challenge of quantifying per-sample informativeness in deep learning. It introduces Laplace Sample Information (LSI), which uses a Laplace-approximated posterior to compute a KL divergence between the full-data and leave-one-out parameter distributions, yielding a per-sample informativeness score. Empirically, LSI orders samples by typicality, detects mislabeled data, reveals class- and dataset-level informativeness patterns, and transfers well from probe models to larger architectures, all while remaining computationally efficient. This approach offers a principled, scalable tool for data-centric ML tasks, enabling more efficient training, dataset auditing, and deeper insights into learning dynamics across modalities and settings.

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and unsupervised settings. Moreover, we show that LSI can be computed efficiently through probes and transfers well to the training of large models.

Paper Structure

This paper contains 26 sections, 10 equations, 41 figures, 3 tables.

Figures (41)

  • Figure 1: $\mathsf{LSI}$ distribution across individual samples of the investigated datasets.
  • Figure 2: Selected images with low/high $\mathsf{LSI}$ in ImageNet and the medical dataset. Samples with low $\mathsf{LSI}$ are representative of their underlying class, whereas high $\mathsf{LSI}$ samples are often mislabeled/ out-of-distribution (red dots) or atypical with respect to exposure, viewing angle, etc.
  • Figure 3: $\mathsf{LSI}$ in supervised text classification on the IMDb dataset using BERT
  • Figure 4: Selected images with low/high $\mathsf{LSI}$ of contrastive learning on COCO
  • Figure 5: $\mathsf{LSI}$ distribution on data with corrupted labels (mislabeled) vs. uncorrupted labels
  • ...and 36 more figures

Theorems & Definitions (6)

  • Definition 1: Sample Information
  • Definition 2: Laplace Sample Information
  • Definition 3: $\left(\alpha, \epsilon\right)$-RDP
  • Definition 4: Rényi divergence
  • Definition 5: Global Sensitivity
  • Definition 6: Local Sensitivity