Table of Contents
Fetching ...

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine

Reza E. Fazel, Arash Bakhtiary, Siavash A. Bigdeli

TL;DR

This work tackles credit card fraud detection under extreme class imbalance by deploying an optimized Explainable Boosting Machine (EBM) framework that avoids resampling. Leveraging the GA2M structure, the authors combine five data transformers with Taguchi design of experiments to optimize both the preprocessing order and model hyperparameters, achieving ROC-AUC up to 0.983 on a Kaggle fraud dataset. They demonstrate that training on a carefully selected subset of top features yields competitive performance while preserving interpretability through global and local explanations of feature effects and interactions. The study highlights the practical potential of combining interpretable models with systematic optimization to enhance trustworthy fraud analytics in financial systems, and outlines future work on online learning and broader design-of-experiments configurations.

Abstract

Addressing class imbalance is a central challenge in credit card fraud detection, as it directly impacts predictive reliability in real-world financial systems. To overcome this, the study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)-a transparent, state-of-the-art implementation of the GA2M algorithm-optimized through systematic hyperparameter tuning, feature selection, and preprocessing refinement. Rather than relying on conventional sampling techniques that may introduce bias or cause information loss, the optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions while providing actionable insights into feature importance and interaction effects. Furthermore, the Taguchi method is employed to optimize both the sequence of data scalers and model hyperparameters, ensuring robust, reproducible, and systematically validated performance improvements. Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines (0.975) and outperforming Logistic Regression, Random Forest, XGBoost, and Decision Tree models. These results highlight the potential of interpretable machine learning and data-driven optimization for advancing trustworthy fraud analytics in financial systems.

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine

TL;DR

This work tackles credit card fraud detection under extreme class imbalance by deploying an optimized Explainable Boosting Machine (EBM) framework that avoids resampling. Leveraging the GA2M structure, the authors combine five data transformers with Taguchi design of experiments to optimize both the preprocessing order and model hyperparameters, achieving ROC-AUC up to 0.983 on a Kaggle fraud dataset. They demonstrate that training on a carefully selected subset of top features yields competitive performance while preserving interpretability through global and local explanations of feature effects and interactions. The study highlights the practical potential of combining interpretable models with systematic optimization to enhance trustworthy fraud analytics in financial systems, and outlines future work on online learning and broader design-of-experiments configurations.

Abstract

Addressing class imbalance is a central challenge in credit card fraud detection, as it directly impacts predictive reliability in real-world financial systems. To overcome this, the study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)-a transparent, state-of-the-art implementation of the GA2M algorithm-optimized through systematic hyperparameter tuning, feature selection, and preprocessing refinement. Rather than relying on conventional sampling techniques that may introduce bias or cause information loss, the optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions while providing actionable insights into feature importance and interaction effects. Furthermore, the Taguchi method is employed to optimize both the sequence of data scalers and model hyperparameters, ensuring robust, reproducible, and systematically validated performance improvements. Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines (0.975) and outperforming Logistic Regression, Random Forest, XGBoost, and Decision Tree models. These results highlight the potential of interpretable machine learning and data-driven optimization for advancing trustworthy fraud analytics in financial systems.
Paper Structure (18 sections, 3 equations, 5 figures, 5 tables)

This paper contains 18 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Chatterjee’s Correlation Coefficient Heatmap
  • Figure 2: Feature Importance and Pairwise Interactions
  • Figure 3: Feature Importance and pairwise interaction of All Features from EBM’s Global Explanation
  • Figure 4: Feature Contributions to a Class 0 Prediction
  • Figure 5: Feature Contributions to a Class 1 Prediction