Table of Contents
Fetching ...

Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios

Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao

TL;DR

A novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses, offering a new perspective for the construction of real-world multimodal safety datasets.

Abstract

Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metric, their overall effectiveness remains unproven. This paper introduces a novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses. Using the image-oriented method, we automatically generate an RMS dataset comprising 35k image-text pairs with guidance responses. Additionally, we introduce a standardized safety dataset evaluation metric: fine-tuning a safety judge model and evaluating its capabilities on other safety datasets.Extensive experiments on various tasks demonstrate the effectiveness of the proposed image-oriented pipeline. The results confirm the scalability and effectiveness of the image-oriented approach, offering a new perspective for the construction of real-world multimodal safety datasets. The dataset is presented at https://huggingface.co/datasets/NewCityLetter/RMS2/tree/main.

Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios

TL;DR

A novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses, offering a new perspective for the construction of real-world multimodal safety datasets.

Abstract

Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metric, their overall effectiveness remains unproven. This paper introduces a novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses. Using the image-oriented method, we automatically generate an RMS dataset comprising 35k image-text pairs with guidance responses. Additionally, we introduce a standardized safety dataset evaluation metric: fine-tuning a safety judge model and evaluating its capabilities on other safety datasets.Extensive experiments on various tasks demonstrate the effectiveness of the proposed image-oriented pipeline. The results confirm the scalability and effectiveness of the image-oriented approach, offering a new perspective for the construction of real-world multimodal safety datasets. The dataset is presented at https://huggingface.co/datasets/NewCityLetter/RMS2/tree/main.

Paper Structure

This paper contains 32 sections, 11 figures, 26 tables.

Figures (11)

  • Figure 1: A conceptual sample of RMS dataset, where the image and text are safe individually and the image comes from real-world scenario.
  • Figure 2: Image-oriented method based on information complementarity.
  • Figure 3: Detailed scenarios in RMS Dataset.
  • Figure 4: The architecture of the image-oriented RMS pipeline. Starting from the real-world safe image, we generate an image-text-risk triplet, and then perform data augmentation.
  • Figure 5: The criteria of augmenting the data.
  • ...and 6 more figures