Can MLLMs Detect Phishing? A Comprehensive Security Benchmark Suite Focusing on Dynamic Threats and Multimodal Evaluation in Academic Environments
Jingzhuo Zhou
TL;DR
This work introduces AdapT-Bench, a comprehensive benchmark to evaluate Multimodal Large Language Models under realistic phishing threats in academic settings. It assembles synthetic scholar profiles and three datasets (Kimi 1000, Strategies Batch 1, Filtered Clean) to simulate dynamic, multimodal, and cross-lingual attacks across varied emotional states. The study demonstrates a paradox where heavily filtered, well-crafted attacks can be easier for AI defenses to detect than unstructured ones, and it reveals strong cross-lingual and jailbreak vulnerabilities in leading models. By providing a standardized evaluation framework and rich, context-aware scenarios, AdapT-Bench enables more robust defense strategies and safer deployment of MLLMs in academic environments.
Abstract
The rapid proliferation of Multimodal Large Language Models (MLLMs) has introduced unprecedented security challenges, particularly in phishing detection within academic environments. Academic institutions and researchers are high-value targets, facing dynamic, multilingual, and context-dependent threats that leverage research backgrounds, academic collaborations, and personal information to craft highly tailored attacks. Existing security benchmarks largely rely on datasets that do not incorporate specific academic background information, making them inadequate for capturing the evolving attack patterns and human-centric vulnerability factors specific to academia. To address this gap, we present AdapT-Bench, a unified methodological framework and benchmark suite for systematically evaluating MLLM defense capabilities against dynamic phishing attacks in academic settings.
