Table of Contents
Fetching ...

Genetic Programming for Explainable Manifold Learning

Ben Cravens, Andrew Lensen, Paula Maddigan, Bing Xue

TL;DR

This work tackles the lack of explicit, interpretable mappings in manifold learning by introducing Genetic Programming for Explainable Manifold Learning (GP-EMaL), a method that penalises GP tree complexity to improve explainability without sacrificing embedding quality. It replaces the embedding-dimension objective in prior GP-MaL-MO with a parameterisable complexity metric that includes symmetry balancing, a scaling term, and a tree-cost term, guiding the evolution toward smaller, more interpretable trees. Through extensive experiments on seven datasets, GP-EMaL achieves comparable predictive performance to GP-MaL-MO while markedly reducing tree size and operator complexity, often yielding more interpretable embeddings. The approach, implemented in open-source Python and accessible via a Streamlit app, advances practical explainable NLDR and offers a foundation for future multi-objective extensions that incorporate neighbourhood preservation, embedding complexity, and dimensionality.

Abstract

Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.

Genetic Programming for Explainable Manifold Learning

TL;DR

This work tackles the lack of explicit, interpretable mappings in manifold learning by introducing Genetic Programming for Explainable Manifold Learning (GP-EMaL), a method that penalises GP tree complexity to improve explainability without sacrificing embedding quality. It replaces the embedding-dimension objective in prior GP-MaL-MO with a parameterisable complexity metric that includes symmetry balancing, a scaling term, and a tree-cost term, guiding the evolution toward smaller, more interpretable trees. Through extensive experiments on seven datasets, GP-EMaL achieves comparable predictive performance to GP-MaL-MO while markedly reducing tree size and operator complexity, often yielding more interpretable embeddings. The approach, implemented in open-source Python and accessible via a Streamlit app, advances practical explainable NLDR and offers a foundation for future multi-objective extensions that incorporate neighbourhood preservation, embedding complexity, and dimensionality.

Abstract

Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, crucial for explainability in many real-world applications. Genetic programming, known for its interpretable functional tree-based models, has emerged as a promising approach to address this challenge. Previous research leveraged multi-objective GP to balance manifold quality against embedding dimensionality, producing functional mappings across a range of embedding sizes. Yet, these mapping trees often became complex, hindering explainability. In response, in this paper, we introduce Genetic Programming for Explainable Manifold Learning (GP-EMaL), a novel approach that directly penalises tree complexity. Our new method is able to maintain high manifold quality while significantly enhancing explainability and also allows customisation of complexity measures, such as symmetry balancing, scaling, and node complexity, catering to diverse application needs. Our experimental analysis demonstrates that GP-EMaL is able to match the performance of the existing approach in most cases, while using simpler, smaller, and more interpretable tree structures. This advancement marks a significant step towards achieving interpretable manifold learning.
Paper Structure (27 sections, 5 equations, 19 figures, 5 tables)

This paper contains 27 sections, 5 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: GP-EMaL Architecture
  • Figure 2: Scaling term for S([0,1]) with $\mu=0.5$
  • Figure 3: Streamlit Application Interface.
  • Figure 4: Front and Classif. Performance on Wine
  • Figure 5: Front and Classif. Performance on Dermatology
  • ...and 14 more figures