Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN
Junpeng Zhang, Lei Cheng, Qing Li, Liang Lin, Quanshi Zhang
TL;DR
The paper examines how DNN generalization relates to the complexity of interactions encoded in the network. It introduces a theoretically grounded framework of AND-OR interactions to explain inferences and defines confusing samples as those driven by non-generalizable patterns, with a universal matching property ensuring a sparse surrogate model can replicate outputs across masked inputs. Through extensive experiments, it shows that high-order, complex interactions emerge mainly during overfitting on a small subset of samples, and that different networks—even with similar performance—have largely different confusing samples. Crucially, the composition of confusing samples is governed by the randomness of low-layer parameters, supporting an extended view of the lottery ticket hypothesis where low-layer initialization largely determines representation power, while high-layer parameters and architecture play a comparatively smaller role. These findings offer a new lens on generalization, suggesting that addressing low-layer randomness could more effectively curb overfitting and improve interpretability of DNN decisions, with implications for targeted regularization and model design.
Abstract
In this paper, we find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power. We also discover that the confusing samples of a DNN, which are represented by non-generalizable interactions, are determined by its low-layer parameters. In comparison, other factors, such as high-layer parameters and network architecture, have much less impact on the composition of confusing samples. Two DNNs with different low-layer parameters usually have fully different sets of confusing samples, even though they have similar performance. This finding extends the understanding of the lottery ticket hypothesis, and well explains distinctive representation power of different DNNs.
