Table of Contents
Fetching ...

A Model of Understanding in Deep Learning Systems

David Peter Wallis Freeborn

Abstract

I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.

A Model of Understanding in Deep Learning Systems

Abstract

I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities, is coupled to the target by stable bridge principles, and supports reliable prediction. I argue that contemporary deep learning systems often can and do achieve such understanding. However they generally fall short of the ideal of scientific understanding: the understanding is symbolically misaligned with the target system, not explicitly reductive, and only weakly unifying. I label this the Fractured Understanding Hypothesis.

Paper Structure

This paper contains 32 sections, 19 equations, 6 figures.

Figures (6)

  • Figure 1: Comparison between a ground truth surface (left) and the neural network’s learned isosurface (right). Observe how the learned isosurface is made by stitching together piecewise-linear surfaces. The ReLU activation functions lead the network's output to be a continuous piecewise-linear function (a multivariate spline), approximating the smooth target curvature through a vast number of local planar regions.
  • Figure 2: Schematics showing the bias–variance tradeoff and the double descent behavior.
  • Figure 3: Comparison between the ground truth torus surface (left) and the neural network’s learned isosurface (right). Observe how the learned isosurface is made by stitching together piecewise-linear surfaces. Nonetheless, it forms a coherent genus-one surface.
  • Figure 4: Training and test accuracy as a function of optimization step in the modular-addition experiment. Training accuracy rises rapidly to near-perfect performance on the seen input pairs, while test accuracy remains low for an extended period. After a period of continued training, the model groks the target system, with test accuracy suddenly increases sharply, nearing the training accuracy. The sharp spikes at around steps 22,000 and 42,000 are consistent with the "slingshot" grokking mechanism identified by thilak2022slingshot.
  • Figure 5: Average magnitude of the 2D discrete Fourier transform of the system's logits over the $(a,b)$ input grid (averaged over output classes). The learned input-output map exhibits strong concentration in Fourier space, consistent with a compact, structured dependence on the inputs.
  • ...and 1 more figures