SIA: Enhancing Safety via Intent Awareness for Vision-Language Models
Youngjin Na, Sangheon Jeong, Youngwan Lee, Jian Lee, Dawoon Jeong, Youngman Kim
TL;DR
SIA addresses safety challenges in vision-language models where harmful intent can emerge from the interaction of an image and accompanying text. It deploys a training-free, three-stage pipeline: (i) captioning to convert visual content into text, (ii) few-shot chain-of-thought prompting to infer latent intent and generate reasoning, and (iii) intent-conditioned response generation to produce safe outputs. The approach demonstrates strong safety gains across SIUO, HoliSafe, and MM-SafetyBench without model fine-tuning, including notable improvements such as Gemma3-IT-4B's safety on SIUO rising from 28.14% to 62.28%. By leveraging explicit intent reasoning with pretrained components, SIA offers a practical, scalable solution for safer multimodal interaction in real-world deployments.
Abstract
With the growing deployment of Vision-Language Models (VLMs) in real-world applications, previously overlooked safety risks are becoming increasingly evident. In particular, seemingly innocuous multimodal inputs can combine to reveal harmful intent, leading to unsafe model outputs. While multimodal safety has received increasing attention, existing approaches often fail to address such latent risks, especially when harmfulness arises only from the interaction between modalities. We propose SIA (Safety via Intent Awareness), a training-free, intent-aware safety framework that proactively detects harmful intent in multimodal inputs and uses it to guide the generation of safe responses. SIA follows a three-stage process: (1) visual abstraction via captioning; (2) intent inference through few-shot chain-of-thought (CoT) prompting; and (3) intent-conditioned response generation. By dynamically adapting to the implicit intent inferred from an image-text pair, SIA mitigates harmful outputs without extensive retraining. Extensive experiments on safety benchmarks, including SIUO, MM-SafetyBench, and HoliSafe, show that SIA consistently improves safety and outperforms prior training-free methods.
