The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

Maxwell Adam; Zach Furman; Jesse Hoogland

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

Maxwell Adam, Zach Furman, Jesse Hoogland

TL;DR

This work introduces the loss kernel, a covariance-based measure of functional similarity between inputs derived from a localized, low-loss probe distribution around a trained neural network. Grounded in singular learning theory, the kernel captures how pairs of inputs respond to joint perturbations in the near-minimal weight space, enabling global structuring and visualization of data as perceived by the model. The authors validate the method on a synthetic multitask problem, where the kernel cleanly separates independent subtasks, and apply it to Inception-v1 on ImageNet, revealing hierarchical structure that aligns with the WordNet taxonomy. This provides a practical, scalable tool for interpretability and data attribution, with potential to guide mechanistic investigations and developmental analyses of model learning.

Abstract

We introduce the loss kernel, an interpretability method for measuring similarity between data points according to a trained neural network. The kernel is the covariance matrix of per-sample losses computed under a distribution of low-loss-preserving parameter perturbations. We first validate our method on a synthetic multitask problem, showing it separates inputs by task as predicted by theory. We then apply this kernel to Inception-v1 to visualize the structure of ImageNet, and we show that the kernel's structure aligns with the WordNet semantic hierarchy. This establishes the loss kernel as a practical tool for interpretability and data attribution.

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

TL;DR

Abstract

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (4)