Table of Contents
Fetching ...

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models

Hanqing Wang, Yuan Tian, Mingyu Liu, Zhenhao Zhang, Xiangyang Zhu

TL;DR

SDEval introduces the first safety-focused dynamic evaluation framework for Multimodal LLMs, addressing data leakage and static benchmark limitations by generating diverse, harder samples through text, image, and text–image dynamics. It employs a semantic validator and a harm scorer to ensure consistency while exposing safety vulnerabilities across leading MLLMs and benchmarks (MLLMGuard, VLSBench, MMVet, MMBench). Empirical results show that dynamic perturbations substantially degrade safety ratings and reveal limitations that static benchmarks miss, while also enabling analysis of safety–capability trade-offs. The work demonstrates that dynamic, cross-modal evaluation can effectively co-evolve with model capabilities, guiding safer deployment and future alignments.

Abstract

In the rapidly evolving landscape of Multimodal Large Language Models (MLLMs), the safety concerns of their outputs have earned significant attention. Although numerous datasets have been proposed, they may become outdated with MLLM advancements and are susceptible to data contamination issues. To address these problems, we propose \textbf{SDEval}, the \textit{first} safety dynamic evaluation framework to controllably adjust the distribution and complexity of safety benchmarks. Specifically, SDEval mainly adopts three dynamic strategies: text, image, and text-image dynamics to generate new samples from original benchmarks. We first explore the individual effects of text and image dynamics on model safety. Then, we find that injecting text dynamics into images can further impact safety, and conversely, injecting image dynamics into text also leads to safety risks. SDEval is general enough to be applied to various existing safety and even capability benchmarks. Experiments across safety benchmarks, MLLMGuard and VLSBench, and capability benchmarks, MMBench and MMVet, show that SDEval significantly influences safety evaluation, mitigates data contamination, and exposes safety limitations of MLLMs. Code is available at https://github.com/hq-King/SDEval

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models

TL;DR

SDEval introduces the first safety-focused dynamic evaluation framework for Multimodal LLMs, addressing data leakage and static benchmark limitations by generating diverse, harder samples through text, image, and text–image dynamics. It employs a semantic validator and a harm scorer to ensure consistency while exposing safety vulnerabilities across leading MLLMs and benchmarks (MLLMGuard, VLSBench, MMVet, MMBench). Empirical results show that dynamic perturbations substantially degrade safety ratings and reveal limitations that static benchmarks miss, while also enabling analysis of safety–capability trade-offs. The work demonstrates that dynamic, cross-modal evaluation can effectively co-evolve with model capabilities, guiding safer deployment and future alignments.

Abstract

In the rapidly evolving landscape of Multimodal Large Language Models (MLLMs), the safety concerns of their outputs have earned significant attention. Although numerous datasets have been proposed, they may become outdated with MLLM advancements and are susceptible to data contamination issues. To address these problems, we propose \textbf{SDEval}, the \textit{first} safety dynamic evaluation framework to controllably adjust the distribution and complexity of safety benchmarks. Specifically, SDEval mainly adopts three dynamic strategies: text, image, and text-image dynamics to generate new samples from original benchmarks. We first explore the individual effects of text and image dynamics on model safety. Then, we find that injecting text dynamics into images can further impact safety, and conversely, injecting image dynamics into text also leads to safety risks. SDEval is general enough to be applied to various existing safety and even capability benchmarks. Experiments across safety benchmarks, MLLMGuard and VLSBench, and capability benchmarks, MMBench and MMVet, show that SDEval significantly influences safety evaluation, mitigates data contamination, and exposes safety limitations of MLLMs. Code is available at https://github.com/hq-King/SDEval

Paper Structure

This paper contains 48 sections, 10 figures, 38 tables.

Figures (10)

  • Figure 1: Dynamic Evaluation vs Static Evaluation. Dynamic evaluation can generate diverse variants from static benchmarks with flexibly adjustable complexity.
  • Figure 2: Comparison of Dynamic and Vanilla Results. After using SDEval, the safety rate is significantly reduced.
  • Figure 3: The whole framework of SDEval. Specifically, the dynamic generation process of SDEval consists of three parts: (a) Text dynamics, where principles like word replacement and paraphrasing are applied. (b) Image dynamics, involving image transformations as well as generation and manipulation. (c) Text-Image dynamics, which mainly utilize two strategies: Text-to-Image and Image-to-Text to generate new image-text pairs. Finally, we evaluate MLLMs' safety on the generated data.
  • Figure 4: Examples of Dynamic Generation Datasets of MLLMGuard. The newly generated dynamic data maintains semantic consistency with the original data after verification.
  • Figure 5: We present the balance scatter plot between MLLM capability and safety under the AI $45^{\circ}$ Law. We show the ranking and dynamic change of all the models.
  • ...and 5 more figures