Table of Contents
Fetching ...

The Interaction Bottleneck of Deep Neural Networks: Discovery, Proof, and Modulation

Huiqi Deng, Qihan Ren, Zhuofan Chen, Zhenyuan Cui, Wen Shen, Peng Zhang, Hongbin Pei, Quanshi Zhang

TL;DR

This work treats interactions as fundamental units of deep network representations and introduces multi-order interactions based on the Shapley interaction index to connect microscopic cooperation patterns with macroscopic capacity. It uncovers a universal interaction bottleneck in which mid-order interactions are consistently underrepresented, and provides a theoretical mechanism based on contextual variability that explains the learning dynamics. The authors propose two modulation losses to steer networks toward learning particular interaction orders, and demonstrate clear micro–macro links: low-order emphasis improves generalization and robustness, high-order emphasis enhances structural modeling and fitting, while mid-order emphasis yields intermediate trade-offs. Together, these findings offer a principled lens for interpreting and guiding deep representations across architectures and tasks.

Abstract

Understanding what kinds of cooperative structures deep neural networks (DNNs) can represent remains a fundamental yet insufficiently understood problem. In this work, we treat interactions as the fundamental units of such structure and investigate a largely unexplored question: how DNNs encode interactions under different levels of contextual complexity, and how these microscopic interaction patterns shape macroscopic representation capacity. To quantify this complexity, we use multi-order interactions [57], where each order reflects the amount of contextual information required to evaluate the joint interaction utility of a variable pair. This formulation enables a stratified analysis of cooperative patterns learned by DNNs. Building on this formulation, we develop a comprehensive study of interaction structure in DNNs. (i) We empirically discover a universal interaction bottleneck: across architectures and tasks, DNNs easily learn low-order and high-order interactions but consistently under-represent mid-order ones. (ii) We theoretically explain this bottleneck by proving that mid-order interactions incur the highest contextual variability, yielding large gradient variance and making them intrinsically difficult to learn. (iii) We further modulate the bottleneck by introducing losses that steer models toward emphasizing interactions of selected orders. Finally, we connect microscopic interaction structures with macroscopic representational behavior: low-order-emphasized models exhibit stronger generalization and robustness, whereas high-order-emphasized models demonstrate greater structural modeling and fitting capability. Together, these results uncover an inherent representational bias in modern DNNs and establish interaction order as a powerful lens for interpreting and guiding deep representations.

The Interaction Bottleneck of Deep Neural Networks: Discovery, Proof, and Modulation

TL;DR

This work treats interactions as fundamental units of deep network representations and introduces multi-order interactions based on the Shapley interaction index to connect microscopic cooperation patterns with macroscopic capacity. It uncovers a universal interaction bottleneck in which mid-order interactions are consistently underrepresented, and provides a theoretical mechanism based on contextual variability that explains the learning dynamics. The authors propose two modulation losses to steer networks toward learning particular interaction orders, and demonstrate clear micro–macro links: low-order emphasis improves generalization and robustness, high-order emphasis enhances structural modeling and fitting, while mid-order emphasis yields intermediate trade-offs. Together, these findings offer a principled lens for interpreting and guiding deep representations across architectures and tasks.

Abstract

Understanding what kinds of cooperative structures deep neural networks (DNNs) can represent remains a fundamental yet insufficiently understood problem. In this work, we treat interactions as the fundamental units of such structure and investigate a largely unexplored question: how DNNs encode interactions under different levels of contextual complexity, and how these microscopic interaction patterns shape macroscopic representation capacity. To quantify this complexity, we use multi-order interactions [57], where each order reflects the amount of contextual information required to evaluate the joint interaction utility of a variable pair. This formulation enables a stratified analysis of cooperative patterns learned by DNNs. Building on this formulation, we develop a comprehensive study of interaction structure in DNNs. (i) We empirically discover a universal interaction bottleneck: across architectures and tasks, DNNs easily learn low-order and high-order interactions but consistently under-represent mid-order ones. (ii) We theoretically explain this bottleneck by proving that mid-order interactions incur the highest contextual variability, yielding large gradient variance and making them intrinsically difficult to learn. (iii) We further modulate the bottleneck by introducing losses that steer models toward emphasizing interactions of selected orders. Finally, we connect microscopic interaction structures with macroscopic representational behavior: low-order-emphasized models exhibit stronger generalization and robustness, whereas high-order-emphasized models demonstrate greater structural modeling and fitting capability. Together, these results uncover an inherent representational bias in modern DNNs and establish interaction order as a powerful lens for interpreting and guiding deep representations.

Paper Structure

This paper contains 25 sections, 3 theorems, 20 equations, 12 figures, 4 tables.

Key Result

Theorem 1

(Proof in supplementary materials) Assume that for any order $m$, $\mathbb{E}_{i,j} \mathbb{E}_{|S|=m}[\frac{\partial \Delta v(i,j,S)}{\partial W}]$$= \bm{0}$The zero-mean assumption is discussed and empirically validated in the supplementary materials.. Let $\sigma^2$ denote the variance of each co where $C = \sqrt{K \eta \frac{\partial L}{\partial v(N)}}$ is a term independent of the order $m$,

Figures (12)

  • Figure 1: (a) Shapley interaction index and multi-order interactions. The bivariate interaction $I(i,j)$ quantifies the influence of the presence/absence of variable $j$ on the importance of variable $i$ across varying contexts. By decomposing $I(i,j)$ by the contextual complexity, the $m$-th order interaction $I^{(m)}(i,j)$ represents the average interaction utility within contexts containing exactly $m$ variables ($|S| = m$). (b) Illustration of low-order and high-order interactions: low-order interactions arise in simple, local contexts, whereas high-order interactions emerge when many contextual regions jointly influence the relationship between $i$ and $j$.
  • Figure 2: Overview. (a) Discovery: Across datasets and architectures, DNNs consistently exhibit a universal interaction bottleneck—strong low-order and high-order interactions but weak mid-order interactions. (b) Proof: Our theoretical analysis characterizes how the learning strength $F^{(m)}$ of $m$-order interactions varies with contextual variability $\tbinom{n-2}{m}$, producing a curve that closely matches the empirical interaction distribution $J^{(m)}$. (c) Modulation: The proposed encouraging/suppressing losses offer explicit modulation of interaction orders, enabling DNNs to preferentially learn low-order, mid-order, or high-order interactions. (d) Micro–Macro Link: Micro-level interaction order shapes macro-level representation capability: high-order DNNs excel in structural modeling, low-order DNNs show superior generalization and robustness, and mid-order DNNs lie in between.
  • Figure 3: Illustration of low-order, mid-order, and high-order interactions. The red bounding box marks the variable pair $(i,j)$, and the unmasked patches are treated as contextual variables. Low-order interactions $I^{(m)}(i,j)$ use only a small context (e.g., $|S|=2$), mid-order interactions arise from intermediate context sizes (e.g., $|S|=10$), and high-order interactions rely on near-global context (e.g., $|S|=20$).
  • Figure 4: Distributions of interaction strength $J^{(m)}$ across a wide range of DNNs, datasets, and architectures. All curves consistently exhibit a characteristic interaction bottleneck: DNNs emphasize low-order and high-order interactions while systematically downweighting mid-order ones.
  • Figure 5: The distribution of interaction strength $J^{(m)}$ across different training epochs, showing that the interaction bottleneck persists throughout training.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 1
  • Theorem 2