Table of Contents
Fetching ...

Optimal Demixing of Nonparametric Densities

Jianqing Fan, Zheng Tracy Ke, Zhaoyang Shi

Abstract

Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe $n$ groups of samples, where the $i$th group consists of $N_i$ independent samples from a $d$-variate density $f_i(x)=\sum_{k=1}^K π_i(k)g_k(x)$. Here, each $g_k(x)$ is a nonparametric density, and each $π_i$ is a $K$-dimensional mixed membership vector. We aim to estimate $g_1(x), \ldots,g_K(x)$. This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any $β>0$, assuming that each $g_k(x)$ is in the Nikol'ski class with a smooth parameter $β$, we show that the sum of integrated squared errors of the constructed estimators has a convergence rate that depends on $n$, $K$, $d$, and the per-group sample size $N$. We also provide a matching lower bound, which suggests that our estimator is rate-optimal.

Optimal Demixing of Nonparametric Densities

Abstract

Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe groups of samples, where the th group consists of independent samples from a -variate density . Here, each is a nonparametric density, and each is a -dimensional mixed membership vector. We aim to estimate . This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any , assuming that each is in the Nikol'ski class with a smooth parameter , we show that the sum of integrated squared errors of the constructed estimators has a convergence rate that depends on , , , and the per-group sample size . We also provide a matching lower bound, which suggests that our estimator is rate-optimal.

Paper Structure

This paper contains 53 sections, 39 theorems, 378 equations, 3 figures.

Key Result

Lemma 2.1

Let $\mathbb{R}^d =\cup_{m=1}^M {\cal B}_m$ be the bins and $\widehat{Q}$ be as in def-hat-Q. Write $U_{ijm}=1\{X_{ij}\in {\cal B}_m\}$, for all $1\leq i\leq n$, $1\leq j\leq N_i$ and $1\leq m\leq M$. Then, the plug-in estimator in naiveEstimate satisfies that $\widehat{\boldsymbol g}^{\text{plug-in

Figures (3)

  • Figure 1: MISE and estimated densities for different $n,N$ and $K$ (Experiment 1). Black: true density; red: estimated density from one realization. Only the estimates for the first two components are presented; other looks similar.
  • Figure 2: Effect of the bandwidth $h$ and the number of bins $M$ on MISE for $n=100$, $N=100$ and $K=3$ (Experiment 2).
  • Figure 3: Comparison of our estimator, which corresponds to the first and third plots, with the one in austern2025poisson, which corresponds to the second and fourth plots (Experiment 3).

Theorems & Definitions (39)

  • Lemma 2.1
  • Theorem 4.1
  • Corollary 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Lemma 6.1: Integrated variance bound for incomplete $U$-processes
  • Lemma 6.2: Bernstein inequality for incomplete $U$-statistics
  • Lemma 6.3
  • ...and 29 more