Multiple Instance Learning for Cheating Detection and Localization in Online Examinations
Yemeng Liu, Jing Ren, Jianshuo Xu, Xiaomei Bai, Roopdeep Kaur, Feng Xia
TL;DR
This work tackles cheating detection in online examinations by framing it as a weakly supervised video anomaly problem. It introduces CHEESE, a framework that couples a MIL-based label generator with a multi-modal feature encoder and a spatio-temporal graph module to detect and localize cheating events using cues from eye gaze, head pose, facial actions, body pose, and background. The key contributions are (i) a continuous sub-bag MIL labeling strategy, (ii) a self-guided attention-enhanced encoder, (iii) a dual-graph spatio-temporal module incorporating temporal consistency and feature similarity, and (iv) comprehensive experiments across UCF-Crime, ShanghaiTech, and OEP showing strong performance and real-time feasibility. The approach demonstrates actionable detection and localization capabilities with practical relevance for online proctoring, and points to future work in expanding multi-modal data and mitigating pseudo-label noise to further improve robustness and accuracy.
Abstract
The spread of the Coronavirus disease-2019 epidemic has caused many courses and exams to be conducted online. The cheating behavior detection model in examination invigilation systems plays a pivotal role in guaranteeing the equality of long-distance examinations. However, cheating behavior is rare, and most researchers do not comprehensively take into account features such as head posture, gaze angle, body posture, and background information in the task of cheating behavior detection. In this paper, we develop and present CHEESE, a CHEating detection framework via multiplE inStancE learning. The framework consists of a label generator that implements weak supervision and a feature encoder to learn discriminative features. In addition, the framework combines body posture and background features extracted by 3D convolution with eye gaze, head posture and facial features captured by OpenFace 2.0. These features are fed into the spatio-temporal graph module by stitching to analyze the spatio-temporal changes in video clips to detect the cheating behaviors. Our experiments on three datasets, UCF-Crime, ShanghaiTech and Online Exam Proctoring (OEP), prove the effectiveness of our method as compared to the state-of-the-art approaches, and obtain the frame-level AUC score of 87.58% on the OEP dataset.
