Layerwise Change of Knowledge in Neural Networks

Xu Cheng; Lei Cheng; Zhaoran Peng; Yang Xu; Tian Han; Quanshi Zhang

Layerwise Change of Knowledge in Neural Networks

Xu Cheng, Lei Cheng, Zhaoran Peng, Yang Xu, Tian Han, Quanshi Zhang

TL;DR

The paper addresses the challenge of understanding how deep networks acquire and forget knowledge across layers by formalizing knowledge as interaction primitives (AND and OR interactions) and extending this notion to intermediate-layer representations. It introduces a linear-probe framework to extract layer-specific signals $v^{(l)}(\boldsymbol{x})$, enabling a decomposition of outputs into interaction effects and a set of metrics to track emergence, forgetting, and sharing of interactions across layers. Key contributions include redefining layerwise interactions, providing metrics for emergence and loss of interactions (e.g., $\text{overlap}$, $\text{forget}$, $\text{new}$, along with $\text{completeness}$ and $\text{redundancy}$), and linking the layerwise change of interactions to the generalization capacity and stability of representations. Empirical results show that low-order interactions generalize better and remain more stable across models, while later layers tend to discard non-generalizable high-order interactions; this offers a principled lens to diagnose and compare learning dynamics across DNN architectures and tasks.

Abstract

This paper aims to explain how a deep neural network (DNN) gradually extracts new knowledge and forgets noisy features through layers in forward propagation. Up to now, although the definition of knowledge encoded by the DNN has not reached a consensus, Previous studies have derived a series of mathematical evidence to take interactions as symbolic primitive inference patterns encoded by a DNN. We extend the definition of interactions and, for the first time, extract interactions encoded by intermediate layers. We quantify and track the newly emerged interactions and the forgotten interactions in each layer during the forward propagation, which shed new light on the learning behavior of DNNs. The layer-wise change of interactions also reveals the change of the generalization capacity and instability of feature representations of a DNN.

Layerwise Change of Knowledge in Neural Networks

TL;DR

, enabling a decomposition of outputs into interaction effects and a set of metrics to track emergence, forgetting, and sharing of interactions across layers. Key contributions include redefining layerwise interactions, providing metrics for emergence and loss of interactions (e.g.,

, along with

and

), and linking the layerwise change of interactions to the generalization capacity and stability of representations. Empirical results show that low-order interactions generalize better and remain more stable across models, while later layers tend to discard non-generalizable high-order interactions; this offers a principled lens to diagnose and compare learning dynamics across DNN architectures and tasks.

Abstract

Paper Structure (28 sections, 2 theorems, 21 equations, 13 figures)

This paper contains 28 sections, 2 theorems, 21 equations, 13 figures.

Introduction
Literature in Explaining Knowledge in DNNs
Tracking Interactions through Layers
Preliminaries: using interactions to represent knowledge in DNNs
Tracking interactions through layers
Verifying the sparsity of interactions
Extracting interactions from intermediate layers
Analyzing the representation capacity of a DNN
Conclusion, Discussion and Future Challenges
Detailed Analysis for Previous Studies Using Knowledge to Explain DNNs
Comparison between Interaction-based Explanation and Attribution Interpretability Methods
Proving the OR Interaction Can Be Considered A Specific AND Interaction
Discussion on Techniques and Limitations of Classifier Probe
Discussion on the Bias Introduced by Masking Input Variables
Proof of Theorem 3.3
...and 13 more sections

Key Result

Theorem 3.3

(Proven in Appendix app_sec:proof_them1) Given an input sample $\boldsymbol{x}\in\mathbb{R}^{n}$, the network output score $v(\boldsymbol{x}_T)$ on each masked input samples $\{\boldsymbol{x}_T\vert T\subseteq N\}$ can be decomposed into effects of AND interactions and OR interactions, subject to $I

Figures (13)

Figure 1: Tracking interactions through layers in the DNN. In most DNNs, early and middle layers usually fit target interactions modeled by the entire network at the cost of encoding lots of redundant interactions, and later layers remove such redundant interactions.
Figure 2: Sparsity of interactions. We visualized strength of all AND-OR interactions extracted from different samples $\boldsymbol{x}$, $\vert I(S \vert \boldsymbol{x})\vert$w.r.t. different $S$ and $\boldsymbol{x}$, in a descending order. Only about $21.8$ AND/OR interactions in each sample of the MNIST dataset and about $45.6$ AND/OR interactions in each sample of the CIFAR-10 dataset made salient effects on the network output.
Figure 3: (a) Tracking the change of the average strength of the overlapped ($\text{overlap}_{\text{and}}^{(l), m}$), forgotten ($\text{forget}_{\text{and}}^{(l), m}$), and newly emerged interactions ($\text{new}_{\text{and}}^{(l), m}$) through different layers. For each subfigure, the total length of the orange bar and the grey bar equals to $\textit{all}_\text{and}^{(l), m}$, and the total length of the blue bar and the grey bar equals to $\textit{all}_\text{and}^{(L), m}$ (b) Tracking the change of $\textit{completeness}^{(l), m}_{\text{and}}$ and $\textit{redundancy}^{(l), m}_{\text{and}}$ through different layers. We do not show interactions of the highest four orders, because almost no interactions of extremely high orders were learned. Please see Appendix \ref{['app_sec:more_result']} for results of OR interactions and results on tabular datasets.
Figure 4: Average IoU values of AND interactions extracted from two DNNs trained for the same task over different input samples. Low-order interactions usually exhibited higher IoU values, thereby being better generalized across DNNs. Please see Appendix \ref{['app_sec:more_result']} for results of OR interactions and Appendix \ref{['app_sec:exp_detail_layer']} for the selected intermediate layer.
Figure 5: The relative stability ($\textit{stability}^{(l),m}_{\text{and}}$) of AND interactions decreased along with the order $m$. Low-order interactions were more stable to inevitable noises in data. See Appendix \ref{['app_sec:more_result']} for results of OR interactions and Appendix \ref{['app_sec:exp_detail_layer']} for the selected intermediate layer.
...and 8 more figures

Theorems & Definitions (6)

Definition 3.1: AND interactions
Definition 3.2: OR interactions
Theorem 3.3
Lemma 3.4
proof
proof

Layerwise Change of Knowledge in Neural Networks

TL;DR

Abstract

Layerwise Change of Knowledge in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (6)