Table of Contents
Fetching ...

Non-linear PCA via Evolution Strategies: a Novel Objective Function

Thomas Uriot, Elise Chung

TL;DR

This work introduces a non-linear PCA framework that preserves interpretability by parameterizing per-variable transformations with neural networks and optimizing them via Evolution Strategies to bypass nondifferentiable eigendecomposition. A key innovation is a granular partial objective that decomposes the global explained variance into per-variable contributions $c_{j,l}$, enabling stronger learning signals and better handling of mixed numerical, categorical, and ordinal data. The approach demonstrates improved explained variance over linear PCA and kernel PCA on synthetic and real OpenML datasets, while maintaining visualizable interpretability through standard tools like biplots. The method offers a scalable, interpretable NLPCA framework with potential broad impact on mixed-data dimensionality reduction and exploratory data analysis.

Abstract

Principal Component Analysis (PCA) is a powerful and popular dimensionality reduction technique. However, due to its linear nature, it often fails to capture the complex underlying structure of real-world data. While Kernel PCA (kPCA) addresses non-linearity, it sacrifices interpretability and struggles with hyperparameter selection. In this paper, we propose a robust non-linear PCA framework that unifies the interpretability of PCA with the flexibility of neural networks. Our method parametrizes variable transformations via neural networks, optimized using Evolution Strategies (ES) to handle the non-differentiability of eigendecomposition. We introduce a novel, granular objective function that maximizes the individual variance contribution of each variable providing a stronger learning signal than global variance maximization. This approach natively handles categorical and ordinal variables without the dimensional explosion associated with one-hot encoding. We demonstrate that our method significantly outperforms both linear PCA and kPCA in explained variance across synthetic and real-world datasets. At the same time, it preserves PCA's interpretability, enabling visualization and analysis of feature contributions using standard tools such as biplots. The code can be found on GitHub.

Non-linear PCA via Evolution Strategies: a Novel Objective Function

TL;DR

This work introduces a non-linear PCA framework that preserves interpretability by parameterizing per-variable transformations with neural networks and optimizing them via Evolution Strategies to bypass nondifferentiable eigendecomposition. A key innovation is a granular partial objective that decomposes the global explained variance into per-variable contributions , enabling stronger learning signals and better handling of mixed numerical, categorical, and ordinal data. The approach demonstrates improved explained variance over linear PCA and kernel PCA on synthetic and real OpenML datasets, while maintaining visualizable interpretability through standard tools like biplots. The method offers a scalable, interpretable NLPCA framework with potential broad impact on mixed-data dimensionality reduction and exploratory data analysis.

Abstract

Principal Component Analysis (PCA) is a powerful and popular dimensionality reduction technique. However, due to its linear nature, it often fails to capture the complex underlying structure of real-world data. While Kernel PCA (kPCA) addresses non-linearity, it sacrifices interpretability and struggles with hyperparameter selection. In this paper, we propose a robust non-linear PCA framework that unifies the interpretability of PCA with the flexibility of neural networks. Our method parametrizes variable transformations via neural networks, optimized using Evolution Strategies (ES) to handle the non-differentiability of eigendecomposition. We introduce a novel, granular objective function that maximizes the individual variance contribution of each variable providing a stronger learning signal than global variance maximization. This approach natively handles categorical and ordinal variables without the dimensional explosion associated with one-hot encoding. We demonstrate that our method significantly outperforms both linear PCA and kPCA in explained variance across synthetic and real-world datasets. At the same time, it preserves PCA's interpretability, enabling visualization and analysis of feature contributions using standard tools such as biplots. The code can be found on GitHub.
Paper Structure (41 sections, 2 theorems, 12 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 41 sections, 2 theorems, 12 equations, 10 figures, 5 tables, 1 algorithm.

Key Result

proposition 1

Let $c_{j,l}$ be the variance contribution of the $l^{\textrm{th}}$ variable $X^{(l)}$ towards the $j^{\textrm{th}}$ eigenvalue $\lambda_j$. Then, we have that

Figures (10)

  • Figure 1: Proportion of explained variance across eight datasets on the validation set (0.75/0.25 split). Lines display the median over 15 independent runs while shaded regions represent the 20th and 80th percentiles. The orange curves illustrate the explained variance achieved using the novel partial contribution objective (Equation \ref{['eqn:contrib_objective']}), where individual variable contributions are optimized separately. The blue curves show the variance achieved using the total explained variance objective (Equation \ref{['eqn:total_variance_obj']}). Dashed lines ($k=1$) represent the proportion of variance accounted for by the first eigenvalue alone, while the solid lines ($k=2$) represent the cumulative variance of the first two eigenvalues. The value of $k$ is also impacting the objectives defined in Equation \ref{['eqn:contrib_objective']} and Equation \ref{['eqn:total_variance_obj']} where the optimization is targeted to maximize the variance up to the $k^{th}$ eigenvalue without consideration for subsequent components. Note that for the alternate_stripes, circles, and spheres datasets, only the $k=1$ case is considered. Across all benchmarks, the partial contribution method consistently outperforms the overall variance objective, achieving higher explained variance and faster convergence.
  • Figure 2: Diagram representation of the transformations of the original variables.
  • Figure 3: Training and evaluation protocol.
  • Figure 4: Top: nested circles. Bottom: nested spheres.
  • Figure 5: Synthetic datasets.
  • ...and 5 more figures

Theorems & Definitions (2)

  • proposition 1
  • proposition 2