Table of Contents
Fetching ...

Random Forest-Supervised Manifold Alignment

Jake S. Rhodes, Adam G. Rustad

TL;DR

This paper focuses on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.

Abstract

Manifold alignment is a type of data fusion technique that creates a shared low-dimensional representation of data collected from multiple domains, enabling cross-domain learning and improved performance in downstream tasks. This paper presents an approach to manifold alignment using random forests as a foundation for semi-supervised alignment algorithms, leveraging the model's inherent strengths. We focus on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests. These proximities serve as a supervised initialization for constructing cross-domain relationships that maintain local neighborhood structures, thereby facilitating alignment. Our approach addresses a common limitation in manifold alignment, where existing methods often fail to generate embeddings that capture sufficient information for downstream classification. By contrast, we find that alignment models that use random forest proximities or class-label information achieve improved accuracy on downstream classification tasks, outperforming single-domain baselines. Experiments across multiple datasets show that our method typically enhances cross-domain feature integration and predictive performance, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.

Random Forest-Supervised Manifold Alignment

TL;DR

This paper focuses on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.

Abstract

Manifold alignment is a type of data fusion technique that creates a shared low-dimensional representation of data collected from multiple domains, enabling cross-domain learning and improved performance in downstream tasks. This paper presents an approach to manifold alignment using random forests as a foundation for semi-supervised alignment algorithms, leveraging the model's inherent strengths. We focus on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests. These proximities serve as a supervised initialization for constructing cross-domain relationships that maintain local neighborhood structures, thereby facilitating alignment. Our approach addresses a common limitation in manifold alignment, where existing methods often fail to generate embeddings that capture sufficient information for downstream classification. By contrast, we find that alignment models that use random forest proximities or class-label information achieve improved accuracy on downstream classification tasks, outperforming single-domain baselines. Experiments across multiple datasets show that our method typically enhances cross-domain feature integration and predictive performance, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.

Paper Structure

This paper contains 7 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: We trained RF and $k$-NN models on each split separately across 5 split types, 16 publicly available datasets, and 3 repetitions each. The models were then trained on the aligned embedding from each method. The proportion of times the classification accuracy exceeded one or both individual models was tracked. Overall, RF-MASH performed best at, exceeding both baselines for random forest models. 30% of data points served as anchors.
  • Figure 2: The same datasets and configurations from Figure \ref{['fig:better-than-all-splits']} were compared here. In this figure, we distinguish between the different split types used to simulate multiple domains. The RF-initialized methods tend to have more well-rounded results across all split types, though each method fails to exceed 50% of the baselines for the random and even splits. KEMA performed best at the rotation split but underperformed RF-MASH at all feature-level splits. DTA does well at each distort and rotate, but not well at feature-level splits.
  • Figure 3: Here we compare all methods that use label information in the alignment process. The combined metric (CE - FOSCTTM) rhodes2024mashspud, is computed and averaged across all datasets. Generally, RF-SPUD and RF-MASH have the best performance with the exception of KEMA on the rotation split.