Minimally Supervised Learning using Topological Projections in Self-Organizing Maps

Zimeng Lyu; Alexander Ororbia; Rui Li; Travis Desell

Minimally Supervised Learning using Topological Projections in Self-Organizing Maps

Zimeng Lyu, Alexander Ororbia, Rui Li, Travis Desell

TL;DR

The paper addresses parameter prediction when labeled data are scarce by introducing a minimally supervised framework based on self-organizing maps (SOMs). It trains SOMs on large unlabeled datasets, maps a small labeled set to BMUs, and predicts unseen data using topological distances and neighbor-based projections, notably a weighted-average approach computed via $e_p = rac{ extstyle igl( ext{ }pigr) rac{1}{d(BMU,n)}}{ extstyle rac{1}{d(BMU,n)}}$. Across coal spectra and appliance-energy datasets, the SOM-based topological projection—especially the weighted-average variant—consistently outperforms classical regression, Gaussian process regression, deep neural networks, KNNs, and DBSCAN, particularly under scarce labeling. The method yields strong performance in high-dimensional settings, provides visualizable topology through the U-Matrix, and demonstrates practical value for domains where labeling is expensive, suggesting broad applicability and avenues for explainability and further topology-based projections.

Abstract

Parameter prediction is essential for many applications, facilitating insightful interpretation and decision-making. However, in many real life domains, such as power systems, medicine, and engineering, it can be very expensive to acquire ground truth labels for certain datasets as they may require extensive and expensive laboratory testing. In this work, we introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs), which significantly reduces the required number of labeled data points to perform parameter prediction, effectively exploiting information contained in large unlabeled datasets. Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU). The values estimated for newly-encountered data points are computed utilizing the average of the $n$ closest labeled data points in the SOM's U-matrix in tandem with a topological shortest path distance calculation scheme. Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques, including linear and polynomial regression, Gaussian process regression, K-nearest neighbors, as well as deep neural network models and related clustering schemes.

Minimally Supervised Learning using Topological Projections in Self-Organizing Maps

TL;DR

. Across coal spectra and appliance-energy datasets, the SOM-based topological projection—especially the weighted-average variant—consistently outperforms classical regression, Gaussian process regression, deep neural networks, KNNs, and DBSCAN, particularly under scarce labeling. The method yields strong performance in high-dimensional settings, provides visualizable topology through the U-Matrix, and demonstrates practical value for domains where labeling is expensive, suggesting broad applicability and avenues for explainability and further topology-based projections.

Abstract

closest labeled data points in the SOM's U-matrix in tandem with a topological shortest path distance calculation scheme. Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques, including linear and polynomial regression, Gaussian process regression, K-nearest neighbors, as well as deep neural network models and related clustering schemes.

Paper Structure (29 sections, 1 equation, 7 figures, 14 tables)

This paper contains 29 sections, 1 equation, 7 figures, 14 tables.

Introduction
Related Work
SOM Topological Projections
Weighted Average Projection
Regression Projection
Experimental Setup
Dataset
Experimental Setting and Preparation
Results
SOM Hyperparameters
Coal Dataset Results
Topological Regression vs Regression Methods
Classical Methods
Gaussian Process Regression (GPR)
Deep Neural Network (DNNs)
...and 14 more sections

Figures (7)

Figure 1: The proposed minimally supervised SOM modeling framework.
Figure 2: An example topological projection of an unlabeled data point to a trained SOM topology with mapped labeled data.
Figure 3: Labeled data point mappings in SOMs trained on the unlabeled coal data with different grid sizes.
Figure 4: DBSCAN with PCA.
Figure 5: SOM vs GPR Estimated Coal properties
...and 2 more figures

Minimally Supervised Learning using Topological Projections in Self-Organizing Maps

TL;DR

Abstract

Minimally Supervised Learning using Topological Projections in Self-Organizing Maps

Authors

TL;DR

Abstract

Table of Contents

Figures (7)