Layerwise Change of Knowledge in Neural Networks
Xu Cheng, Lei Cheng, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang
TL;DR
The paper addresses the challenge of understanding how deep networks acquire and forget knowledge across layers by formalizing knowledge as interaction primitives (AND and OR interactions) and extending this notion to intermediate-layer representations. It introduces a linear-probe framework to extract layer-specific signals $v^{(l)}(\boldsymbol{x})$, enabling a decomposition of outputs into interaction effects and a set of metrics to track emergence, forgetting, and sharing of interactions across layers. Key contributions include redefining layerwise interactions, providing metrics for emergence and loss of interactions (e.g., $\text{overlap}$, $\text{forget}$, $\text{new}$, along with $\text{completeness}$ and $\text{redundancy}$), and linking the layerwise change of interactions to the generalization capacity and stability of representations. Empirical results show that low-order interactions generalize better and remain more stable across models, while later layers tend to discard non-generalizable high-order interactions; this offers a principled lens to diagnose and compare learning dynamics across DNN architectures and tasks.
Abstract
This paper aims to explain how a deep neural network (DNN) gradually extracts new knowledge and forgets noisy features through layers in forward propagation. Up to now, although the definition of knowledge encoded by the DNN has not reached a consensus, Previous studies have derived a series of mathematical evidence to take interactions as symbolic primitive inference patterns encoded by a DNN. We extend the definition of interactions and, for the first time, extract interactions encoded by intermediate layers. We quantify and track the newly emerged interactions and the forgotten interactions in each layer during the forward propagation, which shed new light on the learning behavior of DNNs. The layer-wise change of interactions also reveals the change of the generalization capacity and instability of feature representations of a DNN.
