Asymptotic analysis of the Gaussian kernel matrix for partially noisy data in high dimensions
Kensuke Aishima
TL;DR
This work analyzes the Gaussian kernel matrix in high dimensions under partial noise, building on Karoui’s results that eigenvectors are consistent while eigenvalues may be inconsistent. It combines this asymptotic structure with constrained low-rank approximations to construct strong-consistency estimators, including a rank-deficient correction using the smallest eigenvalue and a block-elimination-based estimator for partially noisy data. The key contributions are: (i) a precise asymptotic relation for eigenvalues and proven strong consistency of eigenvectors; (ii) a simple, robust estimator that recovers the limiting kernel in rank-deficient settings; (iii) an extended noise-model treatment with a structured estimator that remains consistent under partial noisiness. The results enhance robust reconstruction and spectral analysis of kernel matrices in high-dimensional, noisy regimes, with potential impact on kernel methods in data science and scientific computing.
Abstract
The Gaussian kernel is one of the most important kernels, applicable to many research fields, including scientific computing and data science. In this paper, we present asymptotic analysis of the Gaussian kernel matrix in high dimension under a statistical model of noisy data. The main result is a nice combination of Karoui's asymptotic analysis with procedures of constrained low rank matrix approximations. More specifically, Karouli clarified an important asymptotic structure of the Gaussian kernel matrix, leading to strong consistency of the eigenvectors, though the eigenvalues are inconsistent. This paper focuses on the above results and presents a consistent estimator with the use of the smallest eigenvalue, whenever the target kernel matrix tends to low rank in the asymptotic regime. Importantly, asymptotic analysis is given under a statistical model representing partial noise. Although a naive estimator is inconsistent, applying an optimization method for low rank approximations with constraints, we overcome the difficulty caused by the inconsistency, resulting in a new estimator with strong consistency in rank deficient cases.
