Table of Contents
Fetching ...

Granulometric Smoothing on Manifolds

Diego Bolón, Rosa M. Crujeiras, Alberto Rodríguez-Casal

TL;DR

This paper addresses HDR estimation for data supported on Riemannian manifolds by extending granulometric smoothing to the manifold setting. It introduces a practical HDR estimator that expresses L(λ) as a union of balls via a Minkowski/opening-based construction, combining a pilot density estimator f_n with a data-driven radius r_n(λ). The authors establish uniform consistency and convergence rates for L_n(λ) in terms of the density estimation error D_n and a geometric rate, and extend the methodology to HDRs defined by probability content γ, including a data-driven estimator for the corresponding level λ_γ. A novel radius selector r_n(λ) is proposed (with a shrinkage correction) to guarantee consistency and keep computation simple. Real-data illustrations on spherical and toroidal manifolds demonstrate the method’s computational feasibility and robustness relative to plug-in HDR approaches, highlighting its utility for non-Euclidean data analysis.

Abstract

Given a random sample from a density function supported on a manifold $M$, a new method for the estimating highest density regions of the underlying population is introduced. The new proposal is based on the empirical version of the opening operator from mathematical morphology combined with a preliminary estimator of the density function. This results in an estimator that is easy-to-compute since it simply consists of a list of centers and a radius $r$ that are adequately selected from the data. The new estimator is shown to be consistent and its convergence rates in terms of the Hausdorff distance are provided. All consistency results are established uniformly on the level of the set and for any Riemannian manifold $M$ satisfying mild assumptions. The applicability of the procedure is shown by some illustrative examples.

Granulometric Smoothing on Manifolds

TL;DR

This paper addresses HDR estimation for data supported on Riemannian manifolds by extending granulometric smoothing to the manifold setting. It introduces a practical HDR estimator that expresses L(λ) as a union of balls via a Minkowski/opening-based construction, combining a pilot density estimator f_n with a data-driven radius r_n(λ). The authors establish uniform consistency and convergence rates for L_n(λ) in terms of the density estimation error D_n and a geometric rate, and extend the methodology to HDRs defined by probability content γ, including a data-driven estimator for the corresponding level λ_γ. A novel radius selector r_n(λ) is proposed (with a shrinkage correction) to guarantee consistency and keep computation simple. Real-data illustrations on spherical and toroidal manifolds demonstrate the method’s computational feasibility and robustness relative to plug-in HDR approaches, highlighting its utility for non-Euclidean data analysis.

Abstract

Given a random sample from a density function supported on a manifold , a new method for the estimating highest density regions of the underlying population is introduced. The new proposal is based on the empirical version of the opening operator from mathematical morphology combined with a preliminary estimator of the density function. This results in an estimator that is easy-to-compute since it simply consists of a list of centers and a radius that are adequately selected from the data. The new estimator is shown to be consistent and its convergence rates in terms of the Hausdorff distance are provided. All consistency results are established uniformly on the level of the set and for any Riemannian manifold satisfying mild assumptions. The applicability of the procedure is shown by some illustrative examples.
Paper Structure (30 sections, 41 theorems, 285 equations, 13 figures)

This paper contains 30 sections, 41 theorems, 285 equations, 13 figures.

Key Result

Proposition 4.1

Suppose that $M$ is a $d$-dimensional Riemannian manifold satisfying Assumption ass:M2. Given a bounded subset $A \subset M$, define: Then, there exists a constant $K$ such that $D(\varepsilon, A) \leq K \varepsilon^{-d}$ for all $\varepsilon \leq \rho$, where $\rho > 0$ is the same constant as in Assumption ass:M2.

Figures (13)

  • Figure 1: Comet orbits data represented on the sphere using orthogonal projections centered on the north pole (left) and south pole (right).
  • Figure 2: Comet orbits data represented in spherical coordinates. $\varphi$ is the longitude, whereas $\psi$ is the latitude. This representation distort areas and distances: points with $\psi$ close to $\pm \pi/2$ appear further apart than they really are.
  • Figure 3: HDR estimation of a mixture of two von Mises-Fisher distributions. All three graphs show the boundary of the estimator $L_n (\lambda)$ with $\lambda = 0.45$ for a mixture of the distributions $\mathrm{M}_2 (\mu_1, 10)$ and $\mathrm{M}_2 (\mu_2, 10)$ with equal weights (see equation \ref{['eq:mus']} for a definition of $\mu_1$ and $\mu_2$). Three sample sizes are considered, one for each graph: $n = 400$ (a), $n = 800$ (b) and $n = 1600$ (c). The boundary of $L_n (\lambda)$ is plotted in red in all graphs, whereas the boundary of the true HDR $L (\lambda)$ is depicted as a blue line. For reference, the points of the subsamples $\mathcal{X}^{+}_n ( \lambda )$ and $\mathcal{X}^{-}_n ( \lambda )$ are shown in the graphs as '$+$' and '$\circ$', respectively. In all three cases, $r_n (\lambda) = 0.05$ and $f_n$ is the kernel density estimator with von Mises-Fisher kernel (GarciaPortugues2013) and concentration parameter chosen via cross-validation.
  • Figure 4: Example of an HDR $L(\lambda)$ satisfying (left) and not satisfying (right) Assumptions \ref{['ass:A1']} and \ref{['ass:A2']}.
  • Figure 5: HDR estimation of a mixture of two von Mises-Fisher distributions. All three graphs show the boundary of the estimator $L_n (\lambda)$ with $\lambda = 0.45$ for a mixture of the distributions $\mathrm{M}_2 (\mu_1, 10)$ and $\mathrm{M}_2 (\mu_2, 10)$ with equal weights (see equation \ref{['eq:mus']} for a definition of $\mu_1$ and $\mu_2$). Three sample sizes are considered, one for each graph: $n = 400$ (a), $n = 800$ (b) and $n = 1600$ (c). The boundary of $L_n (\lambda)$ is plotted in red in all graphs, whereas the boundary of the true HDR $L (\lambda)$ is depicted as a blue line. For reference, the points of the subsamples $\mathcal{X}^{+}_n ( \lambda )$ and $\mathcal{X}^{-}_n ( \lambda )$ are shown in the graphs as '$+$' and '$\circ$', respectively. In all three cases, the radius is chosen as $0.99r_n (\lambda)$ with $h_n = 1/\log (n)$ and $f_n$ is the kernel density estimator with von Mises-Fisher kernel (GarciaPortugues2013) and concentration parameter selected via cross-validation.
  • ...and 8 more figures

Theorems & Definitions (84)

  • Proposition 4.1
  • Theorem 4.1
  • Remark 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Theorem 4.2
  • Remark 4.2
  • Theorem 5.1
  • Proposition 5.1
  • Remark 5.1
  • ...and 74 more