Table of Contents
Fetching ...

Revisiting Generalization Power of a DNN in Terms of Symbolic Interactions

Lei Cheng, Junpeng Zhang, Qihan Ren, Quanshi Zhang

TL;DR

This work reframes DNN generalization around primitive inference patterns—AND-OR interactions—demonstrating that a DNN's generalization power emerges from a mix of decay-shaped generalizable interactions and spindle-shaped non-generalizable interactions across interaction orders. It provides universal matching guarantees for interaction representations, introduces two learning-dynamics perspectives, and models the two distributions with analytic forms, including a decay-based suppression of high-order interactions via data/parameter uncertainty. A disentangling procedure is proposed to separate and quantify the two interaction families from real networks, and extensive experiments across CNNs and NLP models validate the theory and the fidelity of the disentangling. The results offer a principled, interaction-centric lens on why and how DNNs generalize, with practical implications for diagnosing overfitting and controlling non-generalizable representations.

Abstract

This paper aims to analyze the generalization power of deep neural networks (DNNs) from the perspective of interactions. Unlike previous analysis of a DNN's generalization power in a highdimensional feature space, we find that the generalization power of a DNN can be explained as the generalization power of the interactions. We found that the generalizable interactions follow a decay-shaped distribution, while non-generalizable interactions follow a spindle-shaped distribution. Furthermore, our theory can effectively disentangle these two types of interactions from a DNN. We have verified that our theory can well match real interactions in a DNN in experiments.

Revisiting Generalization Power of a DNN in Terms of Symbolic Interactions

TL;DR

This work reframes DNN generalization around primitive inference patterns—AND-OR interactions—demonstrating that a DNN's generalization power emerges from a mix of decay-shaped generalizable interactions and spindle-shaped non-generalizable interactions across interaction orders. It provides universal matching guarantees for interaction representations, introduces two learning-dynamics perspectives, and models the two distributions with analytic forms, including a decay-based suppression of high-order interactions via data/parameter uncertainty. A disentangling procedure is proposed to separate and quantify the two interaction families from real networks, and extensive experiments across CNNs and NLP models validate the theory and the fidelity of the disentangling. The results offer a principled, interaction-centric lens on why and how DNNs generalize, with practical implications for diagnosing overfitting and controlling non-generalizable representations.

Abstract

This paper aims to analyze the generalization power of deep neural networks (DNNs) from the perspective of interactions. Unlike previous analysis of a DNN's generalization power in a highdimensional feature space, we find that the generalization power of a DNN can be explained as the generalization power of the interactions. We found that the generalizable interactions follow a decay-shaped distribution, while non-generalizable interactions follow a spindle-shaped distribution. Furthermore, our theory can effectively disentangle these two types of interactions from a DNN. We have verified that our theory can well match real interactions in a DNN in experiments.

Paper Structure

This paper contains 28 sections, 2 theorems, 12 equations, 9 figures, 1 table.

Key Result

Theorem 2.1

(Universal matching property, proved by chen2024defining and Appendix proof:universal-matching) Given an input sample $\boldsymbol{x}$, if we set all weights $\forall S\subseteq N$, $I_S^{\text{AND}}= \sum\nolimits_{T\subseteq S} (-1)^{\vert S \vert - \vert T \vert}\cdot u^\text{AND}_T$ and $I_S^{\t where $\boldsymbol{x}_T$ is the masked sample only containing the input variables in $T$. All other

Figures (9)

  • Figure 1: (Left) It is proven that there exists a logical model consisting of AND-OR interactions, which can accurately predict all the DNN's outputs, when we augment the input by enumerating its all $2^n$ masked states. (Right) We have found that interactions in a DNN can be decomposed into a set of generalizable interactions following a decay-shaped distribution and a set of non-generalizable interactions following a spindle-shaped distribution.
  • Figure 2: The two-stage dynamics of interactions in the learning of a DNN. In the first stage, noise interactions generated (at Timepoint A) by randomly initialized parameters were gradually removed (at Timepoint B), and the DNN mainly encoded interactions of a decay-shaped distribution. In the second stage, the newly emerged/learned interactions $\Delta\textbf{A}^{(m), +}$ and $\Delta\textbf{A}^{(m), -}$ follow a spindle-shaped distribution (see Timepoints C and D).
  • Figure 3: The distributions of newly emerged interactions ($\Delta \textbf{A}^{(m), +}, \Delta \textbf{A}^{(m), -}$). All newly emerged interactions followed a spindle-shaped distribution. The magnitude of newly emerged interactions increased along with the increase of the injected non-generalizable representations. The theoretical estimated distributions of interactions $\{\textbf{A}^{(m),+}_{\text{Spindle}}, \textbf{A}^{(m),-}_{\text{Spindle}}\}$ in Equation (\ref{['eq:spindle']}) well matched the true distributions of newly-emerged interactions.
  • Figure 4: The decay-shaped distribution of generalizable interactions ($\textbf{A}^{(m),+}_{\text{decay}},\textbf{A}^{(m),-}_{\text{decay}}$) and the spindle-shaped distribution of non-generalizable interactions ($\textbf{A}^{(m),+}_{\text{spindle}}, \textbf{A}^{(m),-}_{\text{spindle}}$) disentangled by our method. The significance of interactions in the spindle-shaped distribution increased when we injected more non-generalizable representations to the DNN, but the significance of interactions in a decay-shaped distribution was not affected much. It verified the faithfulness of our method. Please see Appendix \ref{['sec:more-results-fig4']} for more results of values $\sigma$.
  • Figure 5: The distributions of generalizable interactions ($\textbf{A}^{(m),+}_{\text{decay}}, \textbf{A}^{(m),-}_{\text{decay}}$) and the distribution of non-generalizable interactions $\{\textbf{A}^{(m),+}_{\text{spindle}}, \textbf{A}^{(m),-}_{\text{spindle}}\}$ extracted from real DNNs at different timepoints of the training process. In the normal learning phase, the DNN mainly penalized interactions in a spindle-shaped distribution and learned interactions in a decay-shaped distribution. In the overfitting phase, the DNN further learned interactions in a spindle-shaped distribution. Please see Appendix \ref{['sec:more-results-fig5']} for results of other two DNNs.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Theorem 2.1
  • Definition 2.2
  • Claim 1: Generalization power of interactions of different orders.
  • Theorem 2.3
  • proof