Table of Contents
Fetching ...

Storage capacity of perceptron with variable selection

Yingying Xu, Masayuki Ohzeki, Yoshiyuki Kabashima

TL;DR

This work analyzes how restricting a perceptron to a sparse subset of input features alters its storage capacity for random patterns. Using a nonrigorous replica approach, the authors quantify the typical number of feasible feature subsets and derive the capacity α_VS under optimal variable selection, showing it can exceed the classical Cover–Gardner bound α_CG=2ρ. The study identifies regions where replica symmetry holds and where AT instabilities imply replica symmetry breaking, highlighting a richer landscape for structure-versus-noise discrimination in high dimensions. Experimental simulations with BIHT-based methods corroborate qualitative gains from variable selection beyond α_CG, illuminating implications for sparse associative memories and resource-constrained learning. The results provide a principled criterion for when learned feature sets reflect genuine structure rather than chance correlations and offer a bridge between statistical mechanics and modern sparse learning theory.

Abstract

A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.

Storage capacity of perceptron with variable selection

TL;DR

This work analyzes how restricting a perceptron to a sparse subset of input features alters its storage capacity for random patterns. Using a nonrigorous replica approach, the authors quantify the typical number of feasible feature subsets and derive the capacity α_VS under optimal variable selection, showing it can exceed the classical Cover–Gardner bound α_CG=2ρ. The study identifies regions where replica symmetry holds and where AT instabilities imply replica symmetry breaking, highlighting a richer landscape for structure-versus-noise discrimination in high dimensions. Experimental simulations with BIHT-based methods corroborate qualitative gains from variable selection beyond α_CG, illuminating implications for sparse associative memories and resource-constrained learning. The results provide a principled criterion for when learned feature sets reflect genuine structure rather than chance correlations and offer a bridge between statistical mechanics and modern sparse learning theory.

Abstract

A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load and the variable selection ratio for which a simple perceptron can perfectly classify random patterns by optimally selecting variables out of variables. While the Cover--Gardner theory establishes that a random subset of dimensions can separate random patterns if and only if , we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.

Paper Structure

This paper contains 15 sections, 46 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Schematic illustration of how the capacity is determined for a fixed variable selection ratio $\rho$. The vector $\bm{c}$ specifies a cluster defined by a particular choice of selected variables, represented by a circle of dotted line. Shaded regions indicate the feasible regions compatible with $\xi^{P}$. For $\alpha < \alpha_{\rm VC} = \rho$, corresponding to the Vapnik--Chervonenkis bound Vapnik1971, all clusters possess feasible regions of finite volums for typical random datasets $\xi^P$. For $\alpha_{\rm VC} < \alpha < \alpha_{\rm CG} = 2\rho$, although a small fraction of clusters disappears, typical clusters still retain feasible regions of finite volume. For $\alpha_{\rm CG} < \alpha < \alpha_{\rm VS}$, typical clusters vanish, yet an exponential number of atypical clusters continue to have nonzero feasible volumes. For $\alpha > \alpha_{\rm VS}$, the feasible region disappears in all clusters. The goal of the present work is to evaluate $\alpha_{\rm VS}$.
  • Figure 2: Profiles of the RS solutions for $\rho = 0.5$ are shown in (a) $q_{1}$ and $q_{0}$, (b) $\chi$, (c) $\Sigma$, and (d) the AT stability condition (\ref{['eq:AT']}) as functions of $\alpha$.
  • Figure 3: Perceptron capacity as a function of the variable selection ratio $\rho$. The solid blue line represents the capacity $\alpha_{\rm VS}$ predicted by the replica symmetric (RS) analysis under optimal variable selection, while the solid orange line shows the classical Cover--Gardner capacity $\alpha_{\rm CG} = 2\rho$. The green, red, and purple markers correspond to the averaged results of the greedy-BIHT experiments for system sizes $N = 64, 128, 256$, respectively.