Table of Contents
Fetching ...

On the number of modes of Gaussian kernel density estimators

Borjan Geshkovski, Philippe Rigollet, Yihang Sun

TL;DR

This work determines the asymptotic growth of the expected number of modes of a Gaussian kernel density estimator on the real line with bandwidth $h=\beta^{-1/2}$, drawn from $n$ iid $N(0,1)$ samples. By combining a Gaussian-process approximation of the KDE derivatives with the Kac-Rice formula and a careful Edgeworth expansion, the authors prove that the expected mode count scales as $\Theta\left(\sqrt{\beta\log\beta}\right)$ under $n^c\lesssim \beta\lesssim n^{2-c}$. The analysis identifies two belt regions where almost all modes concentrate, and it provides precise control over error terms, showing negligible contributions from tails outside these belts. The methods blend probabilistic (Kac-Rice) and analytic (Edgeworth) tools to handle mode counting on the entire real line, extending prior fixed-interval results. The findings have implications for understanding clustering phenomena and metastable behavior in high-bandwidth regimes, with a motivating connection to Transformer self-attention dynamics.

Abstract

We consider the Gaussian kernel density estimator with bandwidth $β^{-\frac12}$ of $n$ iid Gaussian samples. Using the Kac-Rice formula and an Edgeworth expansion, we prove that the expected number of modes on the real line scales as $Θ(\sqrt{β\logβ})$ as $β,n\to\infty$ provided $n^c\lesssim β\lesssim n^{2-c}$ for some constant $c>0$. An impetus behind this investigation is to determine the number of clusters to which Transformers are drawn in a metastable state.

On the number of modes of Gaussian kernel density estimators

TL;DR

This work determines the asymptotic growth of the expected number of modes of a Gaussian kernel density estimator on the real line with bandwidth , drawn from iid samples. By combining a Gaussian-process approximation of the KDE derivatives with the Kac-Rice formula and a careful Edgeworth expansion, the authors prove that the expected mode count scales as under . The analysis identifies two belt regions where almost all modes concentrate, and it provides precise control over error terms, showing negligible contributions from tails outside these belts. The methods blend probabilistic (Kac-Rice) and analytic (Edgeworth) tools to handle mode counting on the entire real line, extending prior fixed-interval results. The findings have implications for understanding clustering phenomena and metastable behavior in high-bandwidth regimes, with a motivating connection to Transformer self-attention dynamics.

Abstract

We consider the Gaussian kernel density estimator with bandwidth of iid Gaussian samples. Using the Kac-Rice formula and an Edgeworth expansion, we prove that the expected number of modes on the real line scales as as provided for some constant . An impetus behind this investigation is to determine the number of clusters to which Transformers are drawn in a metastable state.

Paper Structure

This paper contains 22 sections, 17 theorems, 83 equations, 5 figures.

Key Result

Theorem 1.1

Let $\widehat{P}_n$ be the Gaussian KDE defined in eq:gkde, with bandwidth $h\coloneqq\beta^{-\frac{1}{2}}>0$, of $X_1, \dots, X_n \stackrel{\text{iid}}{\sim} N(0, 1)$. Asymptotically as $n\to\infty$, the expected number $N$ of modes of $\widehat{P}_n$ in a fixed interval $[a, b]$ is

Figures (5)

  • Figure 1: A realization of the kernel density estimator $\widehat{P}_n$ in \ref{['eq:gkde']} for $n=10^4$, with $\beta=100$ (left) and $\beta=300$ (right). Larger $\beta$ narrows the Gaussian kernel, which sharpens $\widehat{P}_n$ and reveals more small peaks on the shoulders, while the central peak remains single. \ref{['thm:main-result']} later quantifies where and how many such peaks appear.
  • Figure 2: (Left) Plot of the average number of modes as a function of $\beta$ for $n=10^3$ (top) and $n=10^4$ (bottom). (Right) Log-log plot for $n=10^3$ (top) and $n=10^4$ (bottom); the predicted linear regression line (red) corroborates a power-law of the form $\text{average \# of modes} \approx 0.179 \cdot \beta^{0.504}$, in line with \ref{['thm:main-result']}.
  • Figure 3: Metastability of self-attention dynamics at temperature $\beta=81$ initialized with $n$ iid uniform points on the circle, with $n=200$ (top) and $n=1000$ (bottom). The number of clusters appears of the correct order $\sim\sqrt{\beta}$. (Code available at https://github.com/borjanG/2023-transformers-rotf.)
  • Figure 4: $n=10^5$ is fixed throughout. (Left) Empirical distribution of the modes of $\widehat{P}_n$ over $T$ for $\beta=100$ (top) and $\beta=300$ (bottom). (Right) The function $t\mapsto \sqrt{\beta}\exp(-A_t)$ for $\beta=100$ (top) and $\beta=300$ (bottom), which, due to the Kac-Rice formula, is an approximation for the distribution of the number of modes of $\widehat{P}_n$ in $T$. Shaded in grey is the interval $T$. (Code available at https://github.com/KimiSun18/2024-gauss-kde-attention.)
  • Figure 5: An estimate of the density $p_t=p_t(x,y)$ of $(F_n(t), F_n'(t))$ for $t=0, 1, 2, 3$ (clockwise from top left), where $\beta=81$ and $n=6500$, so that $\sqrt{2\log n -\log \beta}\approx 3$. (Code available at https://github.com/KimiSun18/2024-gauss-kde-attention.)

Theorems & Definitions (31)

  • Theorem 1.1: mammen95
  • Theorem 1.2
  • Remark 1.3
  • Remark 1.4
  • Remark 1.5: Belt width
  • Remark 1.6
  • Proposition 1.7
  • Proposition 1.8
  • Theorem 2.1: Kac-Rice, azais2009level, adler2009random
  • Lemma 2.2
  • ...and 21 more