Table of Contents
Fetching ...

Model-free Estimation of Latent Structure via Multiscale Nonparametric Maximum Likelihood

Bryon Aragam, Ruiyi Yang

TL;DR

This work introduces a model-free framework for uncovering latent discrete structure in arbitrary high-dimensional densities by constructing a multiscale latent representation via the nonparametric maximum likelihood estimator (NPMLE). At each scale $\sigma$, the density $p_\sigma$ arises as $p_\sigma = \varphi_\sigma * G_\sigma$, with $G_\sigma$ capturing latent structure; as $\sigma$ varies, one obtains a coarse-to-fine view that reveals clusters, subclusters, and hierarchical relations without assuming a parametric form. The authors prove that the NPMLE converges to the KL projection $p_\sigma$ and that the latent components are strongly consistent estimates of the true multiscale structure, enabling a practical clustering algorithm based on model selection and a Bayes partition. The approach yields interpretable qualitative objects such as dendrograms and class-conditional densities, and demonstrates superior clustering performance on simulations and benchmark datasets compared to standard methods. Overall, the paper provides a rigorous, scalable, and flexible pathway to extract meaningful latent structure from complex densities with broad applicability in clustering and density-based inference.

Abstract

Multivariate distributions often carry latent structures that are difficult to identify and estimate, and which better reflect the data generating mechanism than extrinsic structures exhibited simply by the raw data. In this paper, we propose a model-free approach for estimating such latent structures whenever they are present, without assuming they exist a priori. Given an arbitrary density $p_0$, we construct a multiscale representation of the density and propose data-driven methods for selecting representative models that capture meaningful discrete structure. Our approach uses a nonparametric maximum likelihood estimator to estimate the latent structure at different scales and we further characterize their asymptotic limits. By carrying out such a multiscale analysis, we obtain coarseto-fine structures inherent in the original distribution, which are integrated via a model selection procedure to yield an interpretable discrete representation of it. As an application, we design a clustering algorithm based on the proposed procedure and demonstrate its effectiveness in capturing a wide range of latent structures.

Model-free Estimation of Latent Structure via Multiscale Nonparametric Maximum Likelihood

TL;DR

This work introduces a model-free framework for uncovering latent discrete structure in arbitrary high-dimensional densities by constructing a multiscale latent representation via the nonparametric maximum likelihood estimator (NPMLE). At each scale , the density arises as , with capturing latent structure; as varies, one obtains a coarse-to-fine view that reveals clusters, subclusters, and hierarchical relations without assuming a parametric form. The authors prove that the NPMLE converges to the KL projection and that the latent components are strongly consistent estimates of the true multiscale structure, enabling a practical clustering algorithm based on model selection and a Bayes partition. The approach yields interpretable qualitative objects such as dendrograms and class-conditional densities, and demonstrates superior clustering performance on simulations and benchmark datasets compared to standard methods. Overall, the paper provides a rigorous, scalable, and flexible pathway to extract meaningful latent structure from complex densities with broad applicability in clustering and density-based inference.

Abstract

Multivariate distributions often carry latent structures that are difficult to identify and estimate, and which better reflect the data generating mechanism than extrinsic structures exhibited simply by the raw data. In this paper, we propose a model-free approach for estimating such latent structures whenever they are present, without assuming they exist a priori. Given an arbitrary density , we construct a multiscale representation of the density and propose data-driven methods for selecting representative models that capture meaningful discrete structure. Our approach uses a nonparametric maximum likelihood estimator to estimate the latent structure at different scales and we further characterize their asymptotic limits. By carrying out such a multiscale analysis, we obtain coarseto-fine structures inherent in the original distribution, which are integrated via a model selection procedure to yield an interpretable discrete representation of it. As an application, we design a clustering algorithm based on the proposed procedure and demonstrate its effectiveness in capturing a wide range of latent structures.

Paper Structure

This paper contains 34 sections, 10 theorems, 93 equations, 53 figures, 1 algorithm.

Key Result

Proposition 1

Let $\Theta$ be a compact set and $p_0$ be any density. There exists a unique $G_\sigma\in \mathcal{P}(\Theta)$ such that $p_\sigma = \varphi_\sigma \ast G_\sigma$ solves eq:projection. Furthermore, for each $r\in[1,\infty)$ we have almost surely, where $W_r$ is the Wasserstein distance defined in def:Wr.

Figures (53)

  • Figure 1: Density plot
  • Figure 2: Contour plot
  • Figure 3: Samples
  • Figure 5: $\sigma=0.4$
  • Figure 6: $\sigma=0.8$
  • ...and 48 more figures

Theorems & Definitions (27)

  • Proposition 1
  • proof
  • Remark 2
  • Proposition 3
  • proof
  • Remark 4
  • Remark 5
  • Definition 6
  • Remark 7
  • theorem 8
  • ...and 17 more