Table of Contents
Fetching ...

Stochastic tensor space feature theory with applications to robust machine learning

Julio Enrique Castrillon-Candas, Dingning Liu, Sicheng Yang, Xiaoling Zhang, Mark Kon

TL;DR

The paper addresses robust binary classification in high-dimensional, noisy settings where traditional deep learning models struggle with interpretability and data scarcity. It introduces a stochastic tensor space approach based on a Bochner-space formulation and a Karhunen-Loève expansion to build Multilevel Orthogonal Subspaces (MOS) that isolate nominal signal components from anomalies; the resulting projection coefficients form MOS-KL features used by an SVM classifier. The authors establish KL expansion optimality and propose a multilevel construction with anomaly-detecting residual spaces, demonstrating dramatic accuracy gains on ADNI plasma proteomics data and strong performance on cancer gene-expression datasets, especially under unbalanced conditions. This framework offers a robust, interpretable augmentation to existing ML pipelines and is extensible to complex topologies and multi-class problems, with potential integration into deep learning paradigms.

Abstract

In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data is treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This is in contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and (Convolutional) Neural Networks.

Stochastic tensor space feature theory with applications to robust machine learning

TL;DR

The paper addresses robust binary classification in high-dimensional, noisy settings where traditional deep learning models struggle with interpretability and data scarcity. It introduces a stochastic tensor space approach based on a Bochner-space formulation and a Karhunen-Loève expansion to build Multilevel Orthogonal Subspaces (MOS) that isolate nominal signal components from anomalies; the resulting projection coefficients form MOS-KL features used by an SVM classifier. The authors establish KL expansion optimality and propose a multilevel construction with anomaly-detecting residual spaces, demonstrating dramatic accuracy gains on ADNI plasma proteomics data and strong performance on cancer gene-expression datasets, especially under unbalanced conditions. This framework offers a robust, interpretable augmentation to existing ML pipelines and is extensible to complex topologies and multi-class problems, with potential integration into deep learning paradigms.

Abstract

In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data is treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This is in contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and (Convolutional) Neural Networks.

Paper Structure

This paper contains 5 sections, 5 theorems, 19 equations, 10 figures, 1 table.

Key Result

Theorem 1

If $v \in L^{2}(\Omega;L^{2}(U))$, then the random field $v$ can be represented in terms of the Karhunen-Loève (KL) tensor product expansion as where ${\mathbb E} \left[ Y_k Y_l \right] = \delta_{kl}$ and ${\mathbb E} \left[ Y_k \right] = 0$ for all $k,l \in \mathbb{N}$.

Figures (10)

  • Figure 1: Illustrative example of binary classification using classes denoted by blue and orange dots. For (a) we see that the data are well separated, with blue dots forming the first class and orange dots the second class. Due to the separation of the data it is in principle easy to construct a decision boundary. (b) For this case the data classes are mixed, leading to complex boundary decision surfaces that are hard to build, yielding low accuracy. Furthermore the data can be noisy and diffusive in high dimensions, leading to unstable boundary decision surfaces. (c) After applying an appropriate transformation using stochastic coordinate transformations the classes separate, leading to stable boundary decision surfaces.
  • Figure 2: Coordinate transformation reveals the frequency components of the signal $f(t)$ thus making to easier to classify and distinguish. These plots are created in Tikz by modifying the latex code from Neutelings2021aNeutelings2021b.
  • Figure 3: Class separation in approach Hilbert spaces. (a) Given the right basis for the Bochner space $L^{2}_{\mathbb{P}}(\Omega;L^{2}(U))$, it is possible to find a separation between the classes. (b) Construction of subspace $W \subset P_0^{\perp}$ with which external anomalous signals $u^{\mathbf{B}}$ can be detected.
  • Figure 4: Illustrative example of the separation between the projection coefficients of the nominal class and large anomalous signals based on the coefficients $d^l_k$. (a) The orange (nominal class) and blue dots (signal anomaly of the alternative class) corresponds to the original data in the feature space. These observations points are mixed with each other, which makes it hard to build a decision surface. (b) After applying the MOS filter, the orange dots correspond to coefficients $d^l_k$ that are subject to the null hypothesis $H_0$ (nominal class). Thus from Theorem \ref{['mls:theo3']} the coefficients are centered around the origin with high probability. The larger the number of KL eigenfunctions (given by parameter $M$) used to build the multilevel basis, the more likely the concentration of the coefficients is to be around the origin. Conversely, under the alternative hypothesis $H_A$ (signal anomaly) the coefficients $d^l_k$ (blue dots) are likely not to concentrate around zero. This makes it easier to build a separation surface for the two classes.
  • Figure 5: MOS KL training framework for binary classification using SVM. With a slight abuse of notation the map $\Phi: L^{2}(U) \rightarrow \bigoplus_{k \in \mathbb{N}_{0}} S_{k}$ corresponds to the transformation of the signal $u(\mathbf{x},\omega)$ into the spaces $\bigoplus_{k \in \mathbb{N}_{0}} S_{k}$ and so provides the projection coefficients. The MOS are built from the classes where more data is available, in this case from the data of class $\mathbf{A}$; $N_T < m_1$ samples are chosen ( $\bm^{\mathbf{A}}_1,\dots, \bm^{\mathbf{A}}_{N_T}$ ) to estimate the covariance function (matrix) and thus the $M$ eigenvalues and eigenfunctions. The multilevel filter for $\bigoplus_{k \in \mathbb{N}_{0}} S_{k}$ is built from these eigenfunctions and the map $\Phi$ is applied to the data $\bm^{\mathbf{A}}_{N+1},\dots, \bm^{\mathbf{A}}_{m_1}$ and $\bm^{\mathbf{B}}_1,,\dots, \bm^{\mathbf{B}}_{m_2}$, and the SVM classifier is trained.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 1
  • Theorem 4
  • ...and 6 more