Function Forms of Simple ReLU Networks with Random Hidden Weights
Ka Long Keith Ho, Yoshinari Takeishi, Junichi Takeuchi
TL;DR
The paper addresses how wide two-layer ReLU networks with random fixed hidden weights realize specific function forms aligned with the leading eigenvectors of the Fisher information matrix (FIM). By deriving the asymptotic limits of the functionals f_v(x) = X^T v for four eigenvector groups, it shows limiting forms F_0 ∝ ||x||, F_l ∝ x_l/2, F_{γγ} involving quadratic terms, and F_{αβ} ∝ x_α x_β/||x||, with the FIM-induced inner product rendering these directions approximately orthogonal. Theoretical results are supported by simulations across varying dimensions and widths, demonstrating that gradient descent naturally prioritizes these directions in function space. The work provides a rigorous link between parameter and function spaces via the Fisher metric and suggests extensions to NTK and deeper/random-feature settings with potential impact on initialization and architecture design.
Abstract
We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit, highlighting the Fisher information matrix (FIM)'s role in steering learning. Extending seminal works on approximate eigendecomposition of the FIM, we derive the asymptotic behavior of basis functions ($f_v(x) = X^{\top} v $) for four groups of approximate eigenvectors, showing their convergence to distinct function forms. These functions, prioritized by gradient descent, exhibit FIM-induced inner products that approximate orthogonality in the function space, forging a novel connection between parameter and function spaces. Simulations validate the accuracy of these theoretical approximations, confirming their practical relevance. By refining the function space inner product's role, we advance the theoretical framework for ReLU networks, illuminating their optimization and expressivity. Overall, this work offers a robust foundation for understanding wide neural networks and enhances insights into scalable deep learning architectures, paving the way for improved design and analysis of neural networks.
