Table of Contents
Fetching ...

Design a Metric Robust to Complicated High Dimensional Noise for Efficient Manifold Denoising

Hau-Tieng Wu

TL;DR

This work addresses denoising data that lie on a low-dimensional manifold embedded in a high-dimensional space under colored, dependent noise with separable covariance. It introduces ROSDOS, a robust manifold denoiser that fuses ROSELAND diffusion-map geometry with optimal shrinkage (via eOptShrink) and a local refinement to recover pointwise clean samples, scalable through landmark diffusion. The approach leverages a global DM-based metric when conditioning is challenging and supplements it with local shrinkage to preserve local geometry, achieving superior denoising performance across synthetic and semi-real LFP-DBS data compared to existing methods. The results demonstrate ROSDOS’s robustness to high ambient dimension, complex noise structure, and nonstationary artifacts, with practical impact for high-dimensional biomedical signals and related manifold-structured data analyses.

Abstract

In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A systematic comparison with other existing algorithms on both simulated and real datasets is provided. This manuscript is mainly algorithmic and we report several existing tools and numerical results. Theoretical guarantees and more comparisons will be reported in the official paper of this manuscript.

Design a Metric Robust to Complicated High Dimensional Noise for Efficient Manifold Denoising

TL;DR

This work addresses denoising data that lie on a low-dimensional manifold embedded in a high-dimensional space under colored, dependent noise with separable covariance. It introduces ROSDOS, a robust manifold denoiser that fuses ROSELAND diffusion-map geometry with optimal shrinkage (via eOptShrink) and a local refinement to recover pointwise clean samples, scalable through landmark diffusion. The approach leverages a global DM-based metric when conditioning is challenging and supplements it with local shrinkage to preserve local geometry, achieving superior denoising performance across synthetic and semi-real LFP-DBS data compared to existing methods. The results demonstrate ROSDOS’s robustness to high ambient dimension, complex noise structure, and nonstationary artifacts, with practical impact for high-dimensional biomedical signals and related manifold-structured data analyses.

Abstract

In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A systematic comparison with other existing algorithms on both simulated and real datasets is provided. This manuscript is mainly algorithmic and we report several existing tools and numerical results. Theoretical guarantees and more comparisons will be reported in the official paper of this manuscript.
Paper Structure (15 sections, 33 equations, 8 figures)

This paper contains 15 sections, 33 equations, 8 figures.

Figures (8)

  • Figure 1: The empirical spectral density of $\boldsymbol\Xi_1\boldsymbol\Xi_1^\top$. Left: $p=400$ and $n=10000$; middle: $p=400$ and $n=2000$; right: $p=400$ and $n=600$.
  • Figure 2: A summary of denoising efficiency of different algorithms over $12$ different simulated databases with Gaussian noise in terms of NRMSE. The distributions of NRMSE of ROSDOS, GOS, MMLS, MrGap, NRPCA, mFPM and FPM are shown in yellow. From left to right columns, the manifolds are $M_1$, $M_2$ and $M_3$. From the first to third rows are associated with $\alpha=1, 1/2$ and $1/3$. For each method, the distribution of NRMSE is estimated using the kernel density estimation with the Gaussian kernel with the optimal kernel bandwidth, which is shown as the violin plot. The gray horizontal line is the median of $\{\|\xi_i\|_2/\|s_i\|_2\}$. The red bar indicates the median and the black bar indicates the mean. To enhance the visualization, the y-axis upper bound is set to $1.4$ times of the median of $\{\|\xi_i\|_2/\|s_i\|_2\}$. The dagger (circle respectively) indicates that ROSDOS performs better (worse respectively) than the algorithm under comparison. The dagger (circle respectively) indicates that ROSDOS performs better (worse respectively) than other manifold denoiser.
  • Figure 3: A summary of denoising efficiency of different algorithms over $12$ different simulated databases with noise with the separable covariance structure in terms of NRMSE. The distributions of NRMSE of ROSDOS, GOS, MMLS, MrGap, NRPCA, mFPM and FPM are shown in yellow. From left to right columns, the manifolds are $M_1$, $M_2$ and $M_3$. From the first to third rows are associated with $\alpha=1, 1/2$ and $1/3$. For each method, the distribution of NRMSE is estimated using the kernel density estimation with the Gaussian kernel with the optimal kernel bandwidth, which is shown as the violin plot. The gray horizontal line is the median of $\{\|\xi_i\|_2/\|s_i\|_2\}$. The red bar indicates the median and the black bar indicates the mean. To enhance the visualization, the y-axis upper bound is set to $1.4$ times of the median of $\{\|\xi_i\|_2/\|s_i\|_2\}$. The dagger (circle respectively) indicates that ROSDOS performs better (worse respectively) than other manifold denoiser.
  • Figure 4: A visualization of different manifold denoisers using $M_1$ contaminated by noise with separable covariance structure as an example, where we show the first and tenth axes of the high dimensional dataset. From left to right columns: the noisy data (red circle), the ROSDOS result (blue circle), the mFPM result (magenta circle) and the MMLS result (green circle) respectively. From top to bottom rows: the mSNR is 26.48 dB, 3.5dB and -4.2dB respectively. The circle size indicates the root mean squared error.
  • Figure 5: A summary of computational time of different algorithms over $18$ different simulated databases. In the x-axis, G means Gaussian noise, S in the beginning means noise with the separate covariance structure, $1$, $1/2$ and $1/3$ in the middle means $\alpha$, and M1, M2 and M3 in the end means three simulated manifolds. The time is shown in the $\log_{10}$ scale with the unit second.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1