SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Hanqing Wang, Yuan Tian, Mingyu Liu, Zhenhao Zhang, Xiangyang Zhu
TL;DR
SDEval introduces the first safety-focused dynamic evaluation framework for Multimodal LLMs, addressing data leakage and static benchmark limitations by generating diverse, harder samples through text, image, and text–image dynamics. It employs a semantic validator and a harm scorer to ensure consistency while exposing safety vulnerabilities across leading MLLMs and benchmarks (MLLMGuard, VLSBench, MMVet, MMBench). Empirical results show that dynamic perturbations substantially degrade safety ratings and reveal limitations that static benchmarks miss, while also enabling analysis of safety–capability trade-offs. The work demonstrates that dynamic, cross-modal evaluation can effectively co-evolve with model capabilities, guiding safer deployment and future alignments.
Abstract
In the rapidly evolving landscape of Multimodal Large Language Models (MLLMs), the safety concerns of their outputs have earned significant attention. Although numerous datasets have been proposed, they may become outdated with MLLM advancements and are susceptible to data contamination issues. To address these problems, we propose \textbf{SDEval}, the \textit{first} safety dynamic evaluation framework to controllably adjust the distribution and complexity of safety benchmarks. Specifically, SDEval mainly adopts three dynamic strategies: text, image, and text-image dynamics to generate new samples from original benchmarks. We first explore the individual effects of text and image dynamics on model safety. Then, we find that injecting text dynamics into images can further impact safety, and conversely, injecting image dynamics into text also leads to safety risks. SDEval is general enough to be applied to various existing safety and even capability benchmarks. Experiments across safety benchmarks, MLLMGuard and VLSBench, and capability benchmarks, MMBench and MMVet, show that SDEval significantly influences safety evaluation, mitigates data contamination, and exposes safety limitations of MLLMs. Code is available at https://github.com/hq-King/SDEval
