Explainable AI in Big Data Fraud Detection
Ayush Jain, Rahul Kulkarni, Siyi Lin
TL;DR
The paper investigates the integration of explainable AI into Big Data–driven fraud detection and risk management, outlining the tension between predictive performance and transparency. It reviews Big Data characteristics, analytics tools, and XAI methods, identifies scalability and real-time explainability gaps, and proposes the REXAI-FD framework that combines semantic feature engineering, adaptive modeling, and context-aware explanations delivered through a human-in-the-loop pipeline on a cloud-native stack. The contribution includes a detailed architectural design, implementation considerations, and a discussion of theoretical and practical implications, along with open research questions in scalable, privacy-preserving, and standardized explainable fraud analytics. Collectively, the work highlights that future success hinges on distributed, user-centered explanations that meet regulatory demands without sacrificing performance in high-volume, streaming fraud environments.
Abstract
Big Data has become central to modern applications in finance, insurance, and cybersecurity, enabling machine learning systems to perform large-scale risk assessments and fraud detection. However, the increasing dependence on automated analytics introduces important concerns about transparency, regulatory compliance, and trust. This paper examines how explainable artificial intelligence (XAI) can be integrated into Big Data analytics pipelines for fraud detection and risk management. We review key Big Data characteristics and survey major analytical tools, including distributed storage systems, streaming platforms, and advanced fraud detection models such as anomaly detectors, graph-based approaches, and ensemble classifiers. We also present a structured review of widely used XAI methods, including LIME, SHAP, counterfactual explanations, and attention mechanisms, and analyze their strengths and limitations when deployed at scale. Based on these findings, we identify key research gaps related to scalability, real-time processing, and explainability for graph and temporal models. To address these challenges, we outline a conceptual framework that integrates scalable Big Data infrastructure with context-aware explanation mechanisms and human feedback. The paper concludes with open research directions in scalable XAI, privacy-aware explanations, and standardized evaluation methods for explainable fraud detection systems.
