A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization
Sahil Rajesh Dhayalkar
TL;DR
The paper introduces a combinatorial, graph-based theory of dropout by treating the space of dropout subnetworks as a $d$-dimensional hypercube and dropout as a random walk over this subnetwork graph. It defines a subnetwork contribution score $C(f)$ to quantify generalization and proves that good subnetworks form dense, low-resistance clusters, with their number growing exponentially with network width. Theoretical results span mean-field behavior in linear regimes, ensemble compression, smoothness of the generalization landscape, PAC-Bayes generalization bounds, and connections between generalization and graph resistance. Extensive experiments across MNIST and CIFAR-10 validate the claims, showing dropout implicitly samples from a structured, robust ensemble of subnetworks and highlighting potential directions for mask-guided regularization and subnetwork-aware optimization.
Abstract
We propose a combinatorial and graph-theoretic theory of dropout by modeling training as a random walk over a high-dimensional graph of binary subnetworks. Each node represents a masked version of the network, and dropout induces stochastic traversal across this space. We define a subnetwork contribution score that quantifies generalization and show that it varies smoothly over the graph. Using tools from spectral graph theory, PAC-Bayes analysis, and combinatorics, we prove that generalizing subnetworks form large, connected, low-resistance clusters, and that their number grows exponentially with network width. This reveals dropout as a mechanism for sampling from a robust, structured ensemble of well-generalizing subnetworks with built-in redundancy. Extensive experiments validate every theoretical claim across diverse architectures. Together, our results offer a unified foundation for understanding dropout and suggest new directions for mask-guided regularization and subnetwork optimization.
