Table of Contents
Fetching ...

Nonlinear dynamics of localization in neural receptive fields

Leon Lufkin, Andrew M. Saxe, Erin Grant

TL;DR

This work tackles the problem of why neural receptive fields localize in early processing stages by deriving an analytical, data-driven learning dynamics for a minimal nonlinear neuron trained on naturalistic inputs. The authors develop an early-time gradient-flow model, where the localization amplifier $oldsymbol{c}$ depends on the marginal statistics of the input, and show that sufficient negative excess kurtosis in the marginals promotes localized receptive fields, while high positive kurtosis suppresses localization. They further show that elliptical distributions produce nonlocalized, sinusoidal weight states, highlighting the limits of non-Gaussianity as a localization driver. Extending to multi-neuron networks and ICA, they demonstrate both the generality and the constraints of their mechanism, illustrating a data-statistics–driven route to localization that does not require explicit efficiency constraints.

Abstract

Localized receptive fields -- neurons that are selective for certain contiguous spatiotemporal features of their input -- populate early sensory regions of the mammalian brain. Unsupervised learning algorithms that optimize explicit sparsity or independence criteria replicate features of these localized receptive fields, but fail to explain directly how localization arises through learning without efficient coding, as occurs in early layers of deep neural networks and might occur in early sensory regions of biological systems. We consider an alternative model in which localized receptive fields emerge without explicit top-down efficiency constraints -- a feedforward neural network trained on a data model inspired by the structure of natural images. Previous work identified the importance of non-Gaussian statistics to localization in this setting but left open questions about the mechanisms driving dynamical emergence. We address these questions by deriving the effective learning dynamics for a single nonlinear neuron, making precise how higher-order statistical properties of the input data drive emergent localization, and we demonstrate that the predictions of these effective dynamics extend to the many-neuron setting. Our analysis provides an alternative explanation for the ubiquity of localization as resulting from the nonlinear dynamics of learning in neural circuits.

Nonlinear dynamics of localization in neural receptive fields

TL;DR

This work tackles the problem of why neural receptive fields localize in early processing stages by deriving an analytical, data-driven learning dynamics for a minimal nonlinear neuron trained on naturalistic inputs. The authors develop an early-time gradient-flow model, where the localization amplifier depends on the marginal statistics of the input, and show that sufficient negative excess kurtosis in the marginals promotes localized receptive fields, while high positive kurtosis suppresses localization. They further show that elliptical distributions produce nonlocalized, sinusoidal weight states, highlighting the limits of non-Gaussianity as a localization driver. Extending to multi-neuron networks and ICA, they demonstrate both the generality and the constraints of their mechanism, illustrating a data-statistics–driven route to localization that does not require explicit efficiency constraints.

Abstract

Localized receptive fields -- neurons that are selective for certain contiguous spatiotemporal features of their input -- populate early sensory regions of the mammalian brain. Unsupervised learning algorithms that optimize explicit sparsity or independence criteria replicate features of these localized receptive fields, but fail to explain directly how localization arises through learning without efficient coding, as occurs in early layers of deep neural networks and might occur in early sensory regions of biological systems. We consider an alternative model in which localized receptive fields emerge without explicit top-down efficiency constraints -- a feedforward neural network trained on a data model inspired by the structure of natural images. Previous work identified the importance of non-Gaussian statistics to localization in this setting but left open questions about the mechanisms driving dynamical emergence. We address these questions by deriving the effective learning dynamics for a single nonlinear neuron, making precise how higher-order statistical properties of the input data drive emergent localization, and we demonstrate that the predictions of these effective dynamics extend to the many-neuron setting. Our analysis provides an alternative explanation for the ubiquity of localization as resulting from the nonlinear dynamics of learning in neural circuits.

Paper Structure

This paper contains 33 sections, 3 theorems, 39 equations, 8 figures.

Key Result

Lemma 3.1

Under Assumptions item:mean-assumptionitem:covariance-assumption, the gradient flow for the single ReLU neuron in item:single-neuron-model early in training with $y = 0, 1$ trained using MSE loss is where $o_N(1)$ vanishes as $N\to\infty$, and where $\varphi : (-1,1) \to \mathop{\mathrm{\mathbb{R}}}\nolimits$ is defined as and $\operatorname{alg}^{-1}(x) = x/\sqrt{1-x^2}$, the inverse of the alg

Figures (8)

  • Figure 1: (Left) Localization in spatial receptive fields (RFs) measured from non-human primate (NHP) primary visual cortex ringach2002spatial and in spatiotemporal RFs measured from NHP decharms1998optimizing and ferret singer2018sensory primary auditory cortex. (Center) Half-slice of the localized first-layer kernels of AlexNet trained for ImageNet classification krizhevsky2012imagenet. (Right) Localized receptive fields learned from the task of \ref{['sec:task']} in 2-D using ICA hyvarinen2000independent and the soft committee machine (SCM; \ref{['item:many-neuron-model']} with fixed second-layer weights) of \ref{['sec:model']}. Localization---spatial and/or temporal selectivity---appears across settings, as measured by response maximization in biological systems (left) and by inspecting linear filters in artificial systems (center, right).
  • Figure 2: From left: Long- and short-lengthscale samples $\mathbf{x}$, covariances $\Sigma$ for one lengthscale, and marginals $p(X_i)$ for the data models described in \ref{['sec:task']}: Ising (with $J=1.2, 0.3$ for left, right samples), the nonlinear Gaussian process ingrosso2022data, and the controllable kurtosis model, Kur (with $\xi=5, 1$ for left, right samples). Each model generates samples centered about zero and with covariances that can be constrained to be similar, but with differing higher-order statistics, as can be seen from the dimension-wise marginals.
  • Figure 3: From left and for the same Ising, NLGP, and Kur data models as in \ref{['fig:task']}: the marginals $p(X_i)$, the amplifier $\varphi$ defined in \ref{['lem:gradient_flow']} and kurtosis $\kappa$, and the evolution of simulated receptive fields for the single-neuron model (\ref{['item:single-neuron-model']}) trained on its data, and lastly the receptive field given by numerically integrating \ref{['eq:gradient_flow_early']} with $\varphi$ expanded to a third-order Taylor approximation for the same data; training or evolution time is indicated by line color (blue for early-time; red for late-time). See \ref{['sec:theory-validation']} for exposition.
  • Figure 4: Evolution of receptive fields learned by the single-neuron model (\ref{['item:single-neuron-model']}), along with sinusoids fit to final states (red dashes) when trained on data from three elliptical distributions: $t_{40}(\nu=3)$ (left), the surface of an ellipse (middle), and a custom elliptical distribution that places its mass near the outside of an ellipse (right). In all cases, the learned receptive field is oscillatory (a sinusoid), as predicted by Proposition \ref{['thm:elliptical']}. The $\ell_2$ distances between the fitted oscillatory weights and empirical RFs, as a ratio of the $\ell_2$ norm of the empirical RFs, are (left) 9.77%, (center) 3.75%, and (right) 4.14%. See \ref{['sec:elliptical-experiments']} for exposition.
  • Figure 5: IPR vs. excess kurtosis for $\texttt{NLGP}$ and $\texttt{Kur}$ data models, with mean and std. dev. across 30 re-initializations for the single-neuron model (\ref{['item:single-neuron-model']}); error bars are small and may not be visible.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Lemma 3.1
  • Claim 3.2
  • Proposition 3.3
  • Lemma B.1