Random Forest Autoencoders for Guided Representation Learning
Adrien Aumon, Shuang Ni, Myriam Lizotte, Guy Wolf, Kevin R. Moon, Jake S. Rhodes
TL;DR
This work addresses the lack of explicit out-of-sample support in supervised manifold learning for visualization. It introduces Random Forest Autoencoders (RF-AE), an autoencoder framework that learns from RF-GAP proximities and uses a geometry-aware regularization derived from RF-PHATE to produce OOS embeddings that preserve task-relevant structure. The method incorporates class-wise prototype selection to reduce input dimensionality and employs a Jensen-Shannon Divergence reconstruction loss together with a geometric alignment term to fuse local and global supervision. Empirically, RF-AE delivers superior kNN accuracy and structural preservation across 20 datasets, while qualitative visualizations show clearer within-class patterns and preserved anatomical or feature-driven relationships. The approach is extensible to other kernel-based dimensionality reduction methods and offers a scalable, interpretable tool for guided representation learning in large-scale, semi-supervised settings.
Abstract
Extensive research has produced robust methods for unsupervised data visualization. Yet supervised visualization$\unicode{x2013}$where expert labels guide representations$\unicode{x2013}$remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and its application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyperparameters and generalizes to any kernel-based dimensionality reduction method.
