Neural Empirical Bayes
Saeed Saremi, Aapo Hyvarinen
TL;DR
This work unifies kernel density estimation and empirical Bayes within a high-dimensional, concentration-of-measure framework, introducing a neural energy φ to approximate the score function and thereby enable end-to-end learning without explicit nonparametric density estimation. It develops NEBULA, a Hopfield-like associative memory driven by the gradient of φ, and a walk-jump sampling scheme that pairs Langevin dynamics with Robbins-style jumps to sample from the smoothed distribution and steer samples toward latent components. The paper analyzes manifold disintegration-expansion under Gaussian smoothing, introduces i-sphere interactions as a geometric mechanism for learning and memory, and demonstrates novel phenomena such as creative memories arising from highly overlapping spheres. Taken together, the approach provides a scalable, geometry-aware method for unsupervised learning, sampling, and memory-like computation in high dimensions.
Abstract
We unify $\textit{kernel density estimation}$ and $\textit{empirical Bayes}$ and address a set of problems in unsupervised learning with a geometric interpretation of those methods, rooted in the $\textit{concentration of measure}$ phenomenon. Kernel density is viewed symbolically as $X\rightharpoonup Y$ where the random variable $X$ is smoothed to $Y= X+N(0,σ^2 I_d)$, and empirical Bayes is the machinery to denoise in a least-squares sense, which we express as $X \leftharpoondown Y$. A learning objective is derived by combining these two, symbolically captured by $X \rightleftharpoons Y$. Crucially, instead of using the original nonparametric estimators, we parametrize $\textit{the energy function}$ with a neural network denoted by $φ$; at optimality, $\nabla φ\approx -\nabla \log f$ where $f$ is the density of $Y$. The optimization problem is abstracted as interactions of high-dimensional spheres which emerge due to the concentration of isotropic gaussians. We introduce two algorithmic frameworks based on this machinery: (i) a "walk-jump" sampling scheme that combines Langevin MCMC (walks) and empirical Bayes (jumps), and (ii) a probabilistic framework for $\textit{associative memory}$, called NEBULA, defined à la Hopfield by the $\textit{gradient flow}$ of the learned energy to a set of attractors. We finish the paper by reporting the emergence of very rich "creative memories" as attractors of NEBULA for highly-overlapping spheres.
