Table of Contents
Fetching ...

Random Forest Autoencoders for Guided Representation Learning

Adrien Aumon, Shuang Ni, Myriam Lizotte, Guy Wolf, Kevin R. Moon, Jake S. Rhodes

TL;DR

This work addresses the lack of explicit out-of-sample support in supervised manifold learning for visualization. It introduces Random Forest Autoencoders (RF-AE), an autoencoder framework that learns from RF-GAP proximities and uses a geometry-aware regularization derived from RF-PHATE to produce OOS embeddings that preserve task-relevant structure. The method incorporates class-wise prototype selection to reduce input dimensionality and employs a Jensen-Shannon Divergence reconstruction loss together with a geometric alignment term to fuse local and global supervision. Empirically, RF-AE delivers superior kNN accuracy and structural preservation across 20 datasets, while qualitative visualizations show clearer within-class patterns and preserved anatomical or feature-driven relationships. The approach is extensible to other kernel-based dimensionality reduction methods and offers a scalable, interpretable tool for guided representation learning in large-scale, semi-supervised settings.

Abstract

Extensive research has produced robust methods for unsupervised data visualization. Yet supervised visualization$\unicode{x2013}$where expert labels guide representations$\unicode{x2013}$remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and its application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyperparameters and generalizes to any kernel-based dimensionality reduction method.

Random Forest Autoencoders for Guided Representation Learning

TL;DR

This work addresses the lack of explicit out-of-sample support in supervised manifold learning for visualization. It introduces Random Forest Autoencoders (RF-AE), an autoencoder framework that learns from RF-GAP proximities and uses a geometry-aware regularization derived from RF-PHATE to produce OOS embeddings that preserve task-relevant structure. The method incorporates class-wise prototype selection to reduce input dimensionality and employs a Jensen-Shannon Divergence reconstruction loss together with a geometric alignment term to fuse local and global supervision. Empirically, RF-AE delivers superior kNN accuracy and structural preservation across 20 datasets, while qualitative visualizations show clearer within-class patterns and preserved anatomical or feature-driven relationships. The approach is extensible to other kernel-based dimensionality reduction methods and offers a scalable, interpretable tool for guided representation learning in large-scale, semi-supervised settings.

Abstract

Extensive research has produced robust methods for unsupervised data visualization. Yet supervised visualizationwhere expert labels guide representationsremains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and its application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyperparameters and generalizes to any kernel-based dimensionality reduction method.

Paper Structure

This paper contains 31 sections, 1 theorem, 20 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Proposition A.1

For every fixed $i$ and any $j\neq i$, in the limit as the number of trees $|T|\to\infty$,

Figures (9)

  • Figure 1: RF-AE architecture with prototype selection and geometric regularization. First, the original feature vectors $\mathbf{x}_i$ are transformed into one-step transition probability vectors $\mathbf{p}_i$ derived from RF-GAP proximities (Section \ref{['subsec:oosRFGAP']}). They are further reduced into lower-dimensional vectors $\mathbf{p}^*_i$ that represent transition probabilities to $N^*\ll N$ selected prototypes (Section \ref{['subsec:prototype_selection']}). Meanwhile, manifold embeddings $\mathbf{z}_i^G$ are generated using RF-PHATE from the $\mathbf{p}_i$. Finally, $\mathbf{p}^*_i$ and $\mathbf{z}_i^G$ serve as input to the network within the enclosing box, training an encoder $f$ and a decoder $g$ by simultaneously minimizing the reconstruction loss $L_{recon}$ and the geometric loss $L_{geo}$ defined in Section \ref{['subsec:rfae_arch']}.
  • Figure 2: OOS visualization using four different dimensionality reduction methods. Training points are colored by labels, and test points are depicted with original images. Training points are omitted in OrganC MNIST for clarity. a. Sign MNIST (A--K) dataset (Table \ref{['tab:rfae_data']}): RF-AE captures supervised relationships by preserving class-specific variations, such as shadowing and hand orientation, while also highlighting inter-class similarities and maintaining clear class separability. The default RF-PHATE's kernel extension compresses clusters excessively. Parametric $t$-SNE and parametric supervised UMAP demonstrate sensitivity to irrelevant features. b. OrganC MNIST dataset: RF-AE clearly separates similar organ types while preserving their anatomical proximity, capturing both class identity and biological relevance. Other methods tend to merge these classes or distort their relationships, failing to reflect fine-grained anatomical distinctions.
  • Figure S1: Comparison between the standard feature-based MLP encoder and our proposed RF-GAP kernel-based MLP encoder for regressing onto precomputed RF-PHATE embeddings. (a) Ground-truth tree structure with branch labels (see Appendix \ref{['sec:artificial_tree']}). (b) Log-scaled median training MSE with $25^{\text{th}}$ and $75^{\text{th}}$ enclosing percentiles over 50 epochs across 10 repetitions. (c) Training RF-PHATE embeddings (top row), followed by training and test embeddings produced by the RF-GAP-based encoder (middle row) and the feature-based encoder (bottom row) after 50 epochs from a single run. The RF-PHATE embeddings closely match the ground-truth structure and provide a strong target for supervised regression. Our kernel-based encoder remains effective even under high noise levels (e.g., SNR = 0.001), converging more quickly and producing well-structured embeddings with better generalization. In contrast, the feature-based MLP exhibits increasing training loss and disorganized embeddings as noise increases, and often fails to recover meaningful structure even in low-noise settings (SNR = $\infty$, 1), demonstrating the superior robustness of our kernel-based encoders.
  • Figure S2: Illustration of the structural importance alignment (SIA) score defined in Section \ref{['subsec:quantify_oos_embedding']} for evaluating supervised out-of-sample (OOS) embedding fit. a. Random class samples from the high-dimensional Sign MNIST (A--K) dataset. b. 2D embeddings of training and test (OOS) points from RF-AE (left) and P-TSNE (right), based on a stratified 80/20 random split. Training points are color-coded by label; test points are shown as tinted image thumbnails. c. Pixel-level classification importances from the ensemble baseline classifier (Section \ref{['subsec:quantify_oos_embedding']}, Appendix \ref{['sec:baseline_cls_perf']}), normalized to $[0,1]$. d. Pixel-level local structure importances ($s=\textit{Trust}$) from OOS RF-AE (left) and P-TSNE (right), also normalized. e. Local SIA scores computed as the Kendall $\tau$ correlation between (c) and (d): RF-AE achieves higher alignment (0.89) than P-TSNE (0.62), suppressing background pixels and focusing on class-relevant regions.
  • Figure S3: RF-AE training and test embeddings on Sign MNIST (A--K) for various $(\lambda, N^*)$ configurations, where $\lambda$ decreases column-wise from 1 (unconstrained RF-AE) to 0 (RF-PHATE kernel-based MLP extension), and $N^*$ increases row-wise from 2% to 100% of the training set size.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Proposition A.1
  • proof
  • Remark A.2
  • Remark A.3