Table of Contents
Fetching ...

Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification

Fahad Mostafa, Hafiz Khan

TL;DR

This work tackles the challenge of class-imbalanced functional data classification by introducing Functional Random Forest with Adaptive Cost-Sensitive Splitting (FRF-ACS). The method represents curves via FPCA/basis expansions, uses a locally adaptive impurity measure to emphasize minority classes, and employs a hybrid resampling strategy (Functional SMOTE plus cost-sensitive bootstrapping) with curve-aware leaf similarity. Theoretical support is provided through FPCA truncation error identities and a link between the adaptive impurity and weighted misclassification risk, complemented by a consistency argument. Empirically, FRF-ACS yields substantial gains in minority-class detection (F1, AUPRC, MCC) and balanced accuracy across synthetic and four real functional datasets, demonstrating robustness to noise and imbalance while maintaining interpretability.

Abstract

Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture the intrinsic structure of functional observations and struggle with minority class detection. This paper introduces Functional Random Forest with Adaptive Cost-Sensitive Splitting (FRF-ACS), a novel ensemble framework designed for imbalanced functional data classification. The proposed method leverages basis expansions and Functional Principal Component Analysis (FPCA) to represent curves efficiently, enabling trees to operate on low dimensional functional features. To address imbalance, we incorporate a dynamic cost sensitive splitting criterion that adjusts class weights locally at each node, combined with a hybrid sampling strategy integrating functional SMOTE and weighted bootstrapping. Additionally, curve specific similarity metrics replace traditional Euclidean measures to preserve functional characteristics during leaf assignment. Extensive experiments on synthetic and real world datasets including biomedical signals and sensor trajectories demonstrate that FRF-ACS significantly improves minority class recall and overall predictive performance compared to existing functional classifiers and imbalance handling techniques. This work provides a scalable, interpretable solution for high dimensional functional data analysis in domains where minority class detection is critical.

Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification

TL;DR

This work tackles the challenge of class-imbalanced functional data classification by introducing Functional Random Forest with Adaptive Cost-Sensitive Splitting (FRF-ACS). The method represents curves via FPCA/basis expansions, uses a locally adaptive impurity measure to emphasize minority classes, and employs a hybrid resampling strategy (Functional SMOTE plus cost-sensitive bootstrapping) with curve-aware leaf similarity. Theoretical support is provided through FPCA truncation error identities and a link between the adaptive impurity and weighted misclassification risk, complemented by a consistency argument. Empirically, FRF-ACS yields substantial gains in minority-class detection (F1, AUPRC, MCC) and balanced accuracy across synthetic and four real functional datasets, demonstrating robustness to noise and imbalance while maintaining interpretability.

Abstract

Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, while robust for tabular data, often fail to capture the intrinsic structure of functional observations and struggle with minority class detection. This paper introduces Functional Random Forest with Adaptive Cost-Sensitive Splitting (FRF-ACS), a novel ensemble framework designed for imbalanced functional data classification. The proposed method leverages basis expansions and Functional Principal Component Analysis (FPCA) to represent curves efficiently, enabling trees to operate on low dimensional functional features. To address imbalance, we incorporate a dynamic cost sensitive splitting criterion that adjusts class weights locally at each node, combined with a hybrid sampling strategy integrating functional SMOTE and weighted bootstrapping. Additionally, curve specific similarity metrics replace traditional Euclidean measures to preserve functional characteristics during leaf assignment. Extensive experiments on synthetic and real world datasets including biomedical signals and sensor trajectories demonstrate that FRF-ACS significantly improves minority class recall and overall predictive performance compared to existing functional classifiers and imbalance handling techniques. This work provides a scalable, interpretable solution for high dimensional functional data analysis in domains where minority class detection is critical.

Paper Structure

This paper contains 9 sections, 3 theorems, 39 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let $X\in L^2([a,b])$ have mean $\mu$ and covariance operator eigenvalues $(\lambda_m)_{m\ge1}$. For the $M$-term FPCA approximation we have the exact mean-squared truncation identity where $\|\cdot\|_{L^2}$ denotes the $L^2([a,b])$ norm.

Figures (4)

  • Figure 1: Illustration of Balanced and Imbalanced Functional Data. The left panel displays balanced synthetic functional trajectories from two equally represented classes with distinct temporal patterns. The right panel shows a severely imbalanced functional dataset in which the minority-class trajectories (red) are overshadowed by a dense majority class (blue). This imbalance distorts the functional feature space, hindering classifier performance and motivating specialized approaches such as FRF-ACS for robust minority-class detection in functional data analysis.
  • Figure 2: Heatmaps of performance metrics (F1, Balanced Accuracy, AUPRC, MCC) across noise levels (rows) and imbalance ratios (columns). Left column of each pair: baseline; right: SMOTE-enhanced. Values are synthetic surrogates for illustrative purposes.
  • Figure 3: FPCA-dimension sensitivity plot: F1 score versus number of FPCA components retained, comparing baseline and SMOTE-enhanced methods for simulated data.
  • Figure 4: Multi-panel plots for performance metrics (F1, Balanced Accuracy, AUPRC, MCC) vs. imbalance ratio (majority:minority). Each panel compares baseline (orange) and SMOTE-enhanced (blue) methods at fixed noise = 0.05.

Theorems & Definitions (6)

  • Lemma 1: FPCA truncation error
  • proof
  • Proposition 1: Weighted Gini as a surrogate for weighted 0--1 risk
  • proof
  • Proposition 2: Consistency of FRF-ACS under standard conditions
  • proof : Sketch of proof