IntegrityAI at GenAI Detection Task 2: Detecting Machine-Generated Academic Essays in English and Arabic Using ELECTRA and Stylometry
Mohammad AL-Smadi
TL;DR
This work addresses the challenge of detecting machine-generated academic essays in English and Arabic by fine-tuning ELECTRA-based detectors (ELECTRA for English and AraELECTRA for Arabic) with stylometric features. The authors evaluate on a bilingual dataset comprising AI- and human-authored essays, using a three-phase GenAI Content Detection Task 2 setup, and compare against a unigram TF-IDF/SVM baseline. The proposed IntegrityAI models achieve exceptionally high F1-scores in both languages (up to 100% in evaluation and up to 98.5% in testing), with stylometric features providing a notable boost and ELECTRA-Large offering an additional performance gain for English at higher compute cost. The results demonstrate strong generalization and suggest practical deployment potential, while highlighting tradeoffs between accuracy and resources and outlining directions for real-time detection, broader domains, and expanded language coverage.
Abstract
Recent research has investigated the problem of detecting machine-generated essays for academic purposes. To address this challenge, this research utilizes pre-trained, transformer-based models fine-tuned on Arabic and English academic essays with stylometric features. Custom models based on ELECTRA for English and AraELECTRA for Arabic were trained and evaluated using a benchmark dataset. Proposed models achieved excellent results with an F1-score of 99.7%, ranking 2nd among of 26 teams in the English subtask, and 98.4%, finishing 1st out of 23 teams in the Arabic one.
