Revisiting Generalization Power of a DNN in Terms of Symbolic Interactions
Lei Cheng, Junpeng Zhang, Qihan Ren, Quanshi Zhang
TL;DR
This work reframes DNN generalization around primitive inference patterns—AND-OR interactions—demonstrating that a DNN's generalization power emerges from a mix of decay-shaped generalizable interactions and spindle-shaped non-generalizable interactions across interaction orders. It provides universal matching guarantees for interaction representations, introduces two learning-dynamics perspectives, and models the two distributions with analytic forms, including a decay-based suppression of high-order interactions via data/parameter uncertainty. A disentangling procedure is proposed to separate and quantify the two interaction families from real networks, and extensive experiments across CNNs and NLP models validate the theory and the fidelity of the disentangling. The results offer a principled, interaction-centric lens on why and how DNNs generalize, with practical implications for diagnosing overfitting and controlling non-generalizable representations.
Abstract
This paper aims to analyze the generalization power of deep neural networks (DNNs) from the perspective of interactions. Unlike previous analysis of a DNN's generalization power in a highdimensional feature space, we find that the generalization power of a DNN can be explained as the generalization power of the interactions. We found that the generalizable interactions follow a decay-shaped distribution, while non-generalizable interactions follow a spindle-shaped distribution. Furthermore, our theory can effectively disentangle these two types of interactions from a DNN. We have verified that our theory can well match real interactions in a DNN in experiments.
