Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection
Yaw Osei Adjei, Frederick Ayivor, Davis Opoku
TL;DR
The paper tackles BEC detection by juxtaposing a semantic deep-learning stream (DistilBERT) with a forensic psycholinguistic stream (CatBoost) under cost-sensitive constraints. It introduces a hybrid dataset that blends legitimate Enron-like emails with AI-synthesized BEC samples and applies a homoglyph normalization and a financial loss function to optimize thresholds. DistilBERT achieves near-perfect AUC and high F1 on GPU-accelerated hardware, while CatBoost delivers competitive performance with substantially lower latency and resource use, enabling edge deployment. The study highlights the ROI and infrastructure trade-offs, providing a practical framework for choosing between high-accuracy, GPU-intensive models and cost-efficient, CPU-friendly alternatives, and identifies directions for longitudinal validation and psycholinguistic feature verification.
Abstract
Business Email Compromise (BEC) is a sophisticated social engineering threat that manipulates organizational hierarchies, leading to significant financial damage. According to the 2024 FBI Internet Crime Report, BEC accounts for over $2.9 billion in annual losses, presenting a massive economic asymmetry: the financial cost of a False Negative (fraud loss) exceeds the operational cost of a False Positive (manual review) by a ratio of approximately 5,480:1. This paper contrasts two detection paradigms: a Forensic Psycholinguistic Stream (CatBoost), which analyzes linguistic cues like urgency and authority with high interpretability, and a Semantic Stream (DistilBERT), which utilizes deep learning for contextual understanding. We evaluated both streams on a hybrid dataset (N=7,990) containing human-legitimate and AI-synthesized adversarial fraud. Benchmarked on Tesla T4 infrastructure, DistilBERT achieved near-perfect detection on synthetic threats (AUC >0.99, F1 =0.998) with acceptable real-time latency (7.4 ms). CatBoost achieved competitive detection (AUC =0.991, F1 =0.949) at 8.4x lower latency (0.8 ms) with negligible resource consumption. We conclude that while DistilBERT offers maximum accuracy for GPU-equipped organizations, CatBoost provides a viable, cost-effective alternative for edge deployments. Both approaches demonstrate a theoretical ROI exceeding 99.9% when optimized via cost-sensitive learning.
