Table of Contents
Fetching ...

Financial Fraud Detection with Entropy Computing

Babak Emami, Wesley Dyk, David Haycraft, Carrie Spear, Lac Nguyen, Nicholas Chancellor

TL;DR

This work proposes CVQBoost, a boosting algorithm that runs on the Dirac-3 Entropy Quantum Computing platform to solve a continuous-weight boosting problem for fraud detection. By formulating boosting as a continuous quadratic optimization with weights $w_i$ and mapping it to a Dirac-3 Hamiltonian via $J_{ij}$ and $C_i$, CVQBoost achieves competitive AUC against XGBoost while delivering substantially faster training on large-scale data. The results demonstrate favorable scalability across dataset size and feature dimensionality, with CVQBoost outperforming conventional approaches on synthetic data up to 70 million samples and showing robust performance under various class-balancing schemes. The study highlights the practical potential of EQC-based acceleration for high-dimensional, imbalanced classification tasks such as fraud detection, and suggests promising avenues for explainability through the analysis of weak classifiers.

Abstract

We introduce CVQBoost, a novel classification algorithm that leverages early hardware implementing Quantum Computing Inc's Entropy Quantum Computing (EQC) paradigm, Dirac-3 [Nguyen et. al. arXiv:2407.04512]. We apply CVQBoost to a fraud detection test case and benchmark its performance against XGBoost, a widely utilized ML method. Running on Dirac-3, CVQBoost demonstrates a significant runtime advantage over XGBoost, which we evaluate on high-performance hardware comprising up to 48 CPUs and four NVIDIA L4 GPUs using the RAPIDS AI framework. Our results show that CVQBoost maintains competitive accuracy (measured by AUC) while significantly reducing training time, particularly as dataset size and feature complexity increase. To assess scalability, we extend our study to large synthetic datasets ranging from 1M to 70M samples, demonstrating that CVQBoost on Dirac-3 is well-suited for large-scale classification tasks. These findings position CVQBoost as a promising alternative to gradient boosting methods, offering superior scalability and efficiency for high-dimensional ML applications such as fraud detection.

Financial Fraud Detection with Entropy Computing

TL;DR

This work proposes CVQBoost, a boosting algorithm that runs on the Dirac-3 Entropy Quantum Computing platform to solve a continuous-weight boosting problem for fraud detection. By formulating boosting as a continuous quadratic optimization with weights and mapping it to a Dirac-3 Hamiltonian via and , CVQBoost achieves competitive AUC against XGBoost while delivering substantially faster training on large-scale data. The results demonstrate favorable scalability across dataset size and feature dimensionality, with CVQBoost outperforming conventional approaches on synthetic data up to 70 million samples and showing robust performance under various class-balancing schemes. The study highlights the practical potential of EQC-based acceleration for high-dimensional, imbalanced classification tasks such as fraud detection, and suggests promising avenues for explainability through the analysis of weak classifiers.

Abstract

We introduce CVQBoost, a novel classification algorithm that leverages early hardware implementing Quantum Computing Inc's Entropy Quantum Computing (EQC) paradigm, Dirac-3 [Nguyen et. al. arXiv:2407.04512]. We apply CVQBoost to a fraud detection test case and benchmark its performance against XGBoost, a widely utilized ML method. Running on Dirac-3, CVQBoost demonstrates a significant runtime advantage over XGBoost, which we evaluate on high-performance hardware comprising up to 48 CPUs and four NVIDIA L4 GPUs using the RAPIDS AI framework. Our results show that CVQBoost maintains competitive accuracy (measured by AUC) while significantly reducing training time, particularly as dataset size and feature complexity increase. To assess scalability, we extend our study to large synthetic datasets ranging from 1M to 70M samples, demonstrating that CVQBoost on Dirac-3 is well-suited for large-scale classification tasks. These findings position CVQBoost as a promising alternative to gradient boosting methods, offering superior scalability and efficiency for high-dimensional ML applications such as fraud detection.

Paper Structure

This paper contains 18 sections, 5 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: AUC score of CVQBoost vs. minority-to-majority class ratio in training datasets using different balancing strategies: (a) ADASYN, (b) SMOTE, (c) SMOTE-SVM, and (d) majority class downsampling. The bars (obscured by the symbol at most points) are intervals in which $95\%$ of instances should lie. These data can be found in tabular form in table \ref{['tab:accuracy_strategy_ratio']} of the appendix.
  • Figure 2: (a) training runtime of CVQBoost and XGBoost vs. count of training data samples. The bars are intervals in which $95\%$ of instances should lie. (b) fraction of CVQBoost runtime which was comprised by running on Dirac-3.(c) Training runtime of CVQBoost and XGBoost vs. number of features. The bars are intervals in which $95\%$ of instances should lie. (d) fraction of CVQBoost runtime which was comprised by running on Dirac-3. These data can be found in tabular form in tables \ref{['tab:runtime_data_count']} (training counts) and \ref{['tab:runtimes_num_feas']} (feature counts) of the appendix.
  • Figure 3: Training runtime of XGBoost vs. number of cores on a high-performance machine with 16 cores. Intervals within which 95% of the data should lie are shown; training data count: $150,000$, number of features: $38$.
  • Figure 4: comparison of training runtimes for CVQBoost and XGBoost on (a) run on eight cores in parallel (b) GPU using the NVIDIA RAPIDS AI package. Bars represent intervals where $95\%$ of data should lie. These data can be found in tabular form in tables (CPU) and \ref{['tab:runtimes_gpu_rapids']} (GPU) of the appendix.
  • Figure 5: Training runtime of CVQBoost vs. XGBoost on $48$ CPUs and four NVIDIA L4 GPUs. The bars are intervals in which $95\%$ of instances should lie. CVQBoost exhibits superior scalability for large datasets.
  • ...and 2 more figures