On the number of modes of Gaussian kernel density estimators
Borjan Geshkovski, Philippe Rigollet, Yihang Sun
TL;DR
This work determines the asymptotic growth of the expected number of modes of a Gaussian kernel density estimator on the real line with bandwidth $h=\beta^{-1/2}$, drawn from $n$ iid $N(0,1)$ samples. By combining a Gaussian-process approximation of the KDE derivatives with the Kac-Rice formula and a careful Edgeworth expansion, the authors prove that the expected mode count scales as $\Theta\left(\sqrt{\beta\log\beta}\right)$ under $n^c\lesssim \beta\lesssim n^{2-c}$. The analysis identifies two belt regions where almost all modes concentrate, and it provides precise control over error terms, showing negligible contributions from tails outside these belts. The methods blend probabilistic (Kac-Rice) and analytic (Edgeworth) tools to handle mode counting on the entire real line, extending prior fixed-interval results. The findings have implications for understanding clustering phenomena and metastable behavior in high-bandwidth regimes, with a motivating connection to Transformer self-attention dynamics.
Abstract
We consider the Gaussian kernel density estimator with bandwidth $β^{-\frac12}$ of $n$ iid Gaussian samples. Using the Kac-Rice formula and an Edgeworth expansion, we prove that the expected number of modes on the real line scales as $Θ(\sqrt{β\logβ})$ as $β,n\to\infty$ provided $n^c\lesssim β\lesssim n^{2-c}$ for some constant $c>0$. An impetus behind this investigation is to determine the number of clusters to which Transformers are drawn in a metastable state.
