Storage capacity of perceptron with variable selection
Yingying Xu, Masayuki Ohzeki, Yoshiyuki Kabashima
TL;DR
This work analyzes how restricting a perceptron to a sparse subset of input features alters its storage capacity for random patterns. Using a nonrigorous replica approach, the authors quantify the typical number of feasible feature subsets and derive the capacity α_VS under optimal variable selection, showing it can exceed the classical Cover–Gardner bound α_CG=2ρ. The study identifies regions where replica symmetry holds and where AT instabilities imply replica symmetry breaking, highlighting a richer landscape for structure-versus-noise discrimination in high dimensions. Experimental simulations with BIHT-based methods corroborate qualitative gains from variable selection beyond α_CG, illuminating implications for sparse associative memories and resource-constrained learning. The results provide a principled criterion for when learned feature sets reflect genuine structure rather than chance correlations and offer a bridge between statistical mechanics and modern sparse learning theory.
Abstract
A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.
