Stochastic tensor space feature theory with applications to robust machine learning
Julio Enrique Castrillon-Candas, Dingning Liu, Sicheng Yang, Xiaoling Zhang, Mark Kon
TL;DR
The paper addresses robust binary classification in high-dimensional, noisy settings where traditional deep learning models struggle with interpretability and data scarcity. It introduces a stochastic tensor space approach based on a Bochner-space formulation and a Karhunen-Loève expansion to build Multilevel Orthogonal Subspaces (MOS) that isolate nominal signal components from anomalies; the resulting projection coefficients form MOS-KL features used by an SVM classifier. The authors establish KL expansion optimality and propose a multilevel construction with anomaly-detecting residual spaces, demonstrating dramatic accuracy gains on ADNI plasma proteomics data and strong performance on cancer gene-expression datasets, especially under unbalanced conditions. This framework offers a robust, interpretable augmentation to existing ML pipelines and is extensible to complex topologies and multi-class problems, with potential integration into deep learning paradigms.
Abstract
In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data is treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This is in contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and (Convolutional) Neural Networks.
