Table of Contents
Fetching ...

Demystifying Functional Random Forests: Novel Explainability Tools for Model Transparency in High-Dimensional Spaces

Fabrizio Maturo, Annamaria Porreca

TL;DR

The paper tackles the interpretability gap of Functional Random Forests (FRF) in high-dimensional FDA contexts by introducing an explainability toolkit built around Functional Partial Dependence Plots (FPDPs), Functional Principal Component Probability Heatmaps (FPCPH), and both model-specific and model-agnostic FPC importance metrics. It formalizes the FRF framework (ensemble of $M$ Functional Classification Trees with randomization over $m$ of $K$ FPCs per split) and links predictions to time-domain reconstructions of functional data via Functional Principal Components. The core contributions are the FPDPs, FPCPH, the FPC Internal-External Importance and Explained Variance Bubble Plot, and their integration for comprehensive interpretation of FRF decisions, demonstrated on ECG200 data. The work advances practical transparency of FRF in biomedical and other high-dimensional functional settings, enabling trustworthy deployment by revealing which FPCs drive predictions and how their scores influence curve shapes over time.

Abstract

The advent of big data has raised significant challenges in analysing high-dimensional datasets across various domains such as medicine, ecology, and economics. Functional Data Analysis (FDA) has proven to be a robust framework for addressing these challenges, enabling the transformation of high-dimensional data into functional forms that capture intricate temporal and spatial patterns. However, despite advancements in functional classification methods and very high performance demonstrated by combining FDA and ensemble methods, a critical gap persists in the literature concerning the transparency and interpretability of black-box models, e.g. Functional Random Forests (FRF). In response to this need, this paper introduces a novel suite of explainability tools to illuminate the inner mechanisms of FRF. We propose using Functional Partial Dependence Plots (FPDPs), Functional Principal Component (FPC) Probability Heatmaps, various model-specific and model-agnostic FPCs' importance metrics, and the FPC Internal-External Importance and Explained Variance Bubble Plot. These tools collectively enhance the transparency of FRF models by providing a detailed analysis of how individual FPCs contribute to model predictions. By applying these methods to an ECG dataset, we demonstrate the effectiveness of these tools in revealing critical patterns and improving the explainability of FRF.

Demystifying Functional Random Forests: Novel Explainability Tools for Model Transparency in High-Dimensional Spaces

TL;DR

The paper tackles the interpretability gap of Functional Random Forests (FRF) in high-dimensional FDA contexts by introducing an explainability toolkit built around Functional Partial Dependence Plots (FPDPs), Functional Principal Component Probability Heatmaps (FPCPH), and both model-specific and model-agnostic FPC importance metrics. It formalizes the FRF framework (ensemble of Functional Classification Trees with randomization over of FPCs per split) and links predictions to time-domain reconstructions of functional data via Functional Principal Components. The core contributions are the FPDPs, FPCPH, the FPC Internal-External Importance and Explained Variance Bubble Plot, and their integration for comprehensive interpretation of FRF decisions, demonstrated on ECG200 data. The work advances practical transparency of FRF in biomedical and other high-dimensional functional settings, enabling trustworthy deployment by revealing which FPCs drive predictions and how their scores influence curve shapes over time.

Abstract

The advent of big data has raised significant challenges in analysing high-dimensional datasets across various domains such as medicine, ecology, and economics. Functional Data Analysis (FDA) has proven to be a robust framework for addressing these challenges, enabling the transformation of high-dimensional data into functional forms that capture intricate temporal and spatial patterns. However, despite advancements in functional classification methods and very high performance demonstrated by combining FDA and ensemble methods, a critical gap persists in the literature concerning the transparency and interpretability of black-box models, e.g. Functional Random Forests (FRF). In response to this need, this paper introduces a novel suite of explainability tools to illuminate the inner mechanisms of FRF. We propose using Functional Partial Dependence Plots (FPDPs), Functional Principal Component (FPC) Probability Heatmaps, various model-specific and model-agnostic FPCs' importance metrics, and the FPC Internal-External Importance and Explained Variance Bubble Plot. These tools collectively enhance the transparency of FRF models by providing a detailed analysis of how individual FPCs contribute to model predictions. By applying these methods to an ECG dataset, we demonstrate the effectiveness of these tools in revealing critical patterns and improving the explainability of FRF.
Paper Structure (15 sections, 24 equations, 12 figures)

This paper contains 15 sections, 24 equations, 12 figures.

Figures (12)

  • Figure 1: Variation of the first four Functional Principal Components (FPCs) with changes in their scores.
  • Figure 2: Variation of the reconstructed curves using the First Functional Principal Components (FPC) based on a categorization of the ranges of its score.
  • Figure 3: Original ECG signals from the training and test sets. The left panel shows the signals from the training set, and the right panel shows the signals from the test set. The blue curves correspond to healthy patients (NH), while the green curves represent those diagnosed with heart disease (MI).
  • Figure 4: Functional Principal Components Decomposition of the dataset. The plot shows the first 15 Functional Principal Components (FPCs), with the associated explained variance.
  • Figure 5: Functional Partial Dependence Plots (FPDPs) for the First 15 Functional Principal Components (FPCs). Each subplot shows the variation in predicted probability as the score of a single FPC is varied, while the other scores remain constant. The plots are ordered by the amount of variance explained by each FPC.
  • ...and 7 more figures