Table of Contents
Fetching ...

Semi-supervised Classification for Functional Data with Application to Astronomical Spectra Analysis

Ruoxu Tan, Mingjie Jian, Yiming Zang

Abstract

Despite its extensive development for multivariate data, semi-supervised learning remains underdeveloped for functional data. To address this challenge, we extend the Fermat distance, a density-sensitive metric aligning with the semi-supervised setting, to the functional domain. Leveraging the Fermat distance, we propose novel semi-supervised classifiers, including the weighted $k$-nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. To accommodate massive datasets commonly seen in semi-supervised applications, we design a computationally efficient estimation procedure tailored for discrete and noisy functional observations. Theoretically, we establish exponentially decaying convergence rates of the $k$-NN classifier and the consistency of the estimated Fermat distance. Crucially, our results reveal a phenomenon unique to error-contaminated functional data: Incorporating unlabeled data leads to improved classification accuracy only when the individual sampling rate grows sufficiently fast. Applying our framework to simulated data and a large-scale dataset of Gaia astronomical spectra, we demonstrate that our proposed semi-supervised classifiers uniformly outperform existing supervised benchmarks.

Semi-supervised Classification for Functional Data with Application to Astronomical Spectra Analysis

Abstract

Despite its extensive development for multivariate data, semi-supervised learning remains underdeveloped for functional data. To address this challenge, we extend the Fermat distance, a density-sensitive metric aligning with the semi-supervised setting, to the functional domain. Leveraging the Fermat distance, we propose novel semi-supervised classifiers, including the weighted -nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. To accommodate massive datasets commonly seen in semi-supervised applications, we design a computationally efficient estimation procedure tailored for discrete and noisy functional observations. Theoretically, we establish exponentially decaying convergence rates of the -NN classifier and the consistency of the estimated Fermat distance. Crucially, our results reveal a phenomenon unique to error-contaminated functional data: Incorporating unlabeled data leads to improved classification accuracy only when the individual sampling rate grows sufficiently fast. Applying our framework to simulated data and a large-scale dataset of Gaia astronomical spectra, we demonstrate that our proposed semi-supervised classifiers uniformly outperform existing supervised benchmarks.

Paper Structure

This paper contains 19 sections, 3 theorems, 16 equations, 6 figures.

Key Result

Theorem 1

Under Assumptions CA1 to CA3, set $k\asymp [ n_\ell/\log(n_\ell) ]$ and $\sigma\asymp (k/n_\ell)^{1/d}$, then there exists a constant $C>0$ and $N_{\ell0} \in \mathbb{N}^+$ such that for all $n_\ell>N_{\ell0}$, $\blacktriangleleft$$\blacktriangleleft$

Figures (6)

  • Figure 1: The average rates of classification accuracy of all classifiers under simulation models (i) to (iv) with different labeled sample sizes.
  • Figure 2: The average rates of classification accuracy of semisupervised classifiers with $J\equiv 50$ (first row) and $J=(50,100,200,400)$ (second row) and different sample sizes under models (i) to (iii).
  • Figure 3: The average rates of classification accuracy of SVM-related classifiers under simulation models (i) to (iii) with different labeled sample sizes.
  • Figure 4: Left: The boxplot of the best class scores of each class; Right: The smoothed spectra of each class on the window normalized to $[0,1]$. The flux values are vertically shifted according to different classes.
  • Figure 5: Average rate of classification accuracy of all classifiers applied on the astronomical spectral data under different values of $\theta$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Cluster
  • Theorem 1
  • Theorem 2
  • Theorem 3