Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

Gabriel Downer; Sean Craven; Damian Ruck; Jake Thomas

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

Gabriel Downer, Sean Craven, Damian Ruck, Jake Thomas

TL;DR

Text2VLM introduces an automated pipeline that converts text-only evaluation prompts into multimodal inputs by rendering typographic representations of salient harmful content, enabling robust assessment of Visual Language Model (VLM) safety under multimodal prompt injections. The approach combines prompt summarization, salient-concept extraction, and typographic image generation, followed by LLM- and human-led evaluation of relevance, safety refusals, and understanding. Open-source VLMs show increased vulnerability to multimodal prompts, with safety alignment weakening when text and image inputs are combined, despite OCR and multimodal understanding limitations. The work provides an open-source tool for systematic safety evaluation and highlights the need for stronger alignment and OCR-capable models to ensure safer deployment of multimodal AI systems.

Abstract

The increasing integration of Visual Language Models (VLMs) into AI systems necessitates robust model alignment, especially when handling multimodal content that combines text and images. Existing evaluation datasets heavily lean towards text-only prompts, leaving visual vulnerabilities under evaluated. To address this gap, we propose \textbf{Text2VLM}, a novel multi-stage pipeline that adapts text-only datasets into multimodal formats, specifically designed to evaluate the resilience of VLMs against typographic prompt injection attacks. The Text2VLM pipeline identifies harmful content in the original text and converts it into a typographic image, creating a multimodal prompt for VLMs. Also, our evaluation of open-source VLMs highlights their increased susceptibility to prompt injection when visual inputs are introduced, revealing critical weaknesses in the current models' alignment. This is in addition to a significant performance gap compared to closed-source frontier models. We validate Text2VLM through human evaluations, ensuring the alignment of extracted salient concepts; text summarization and output classification align with human expectations. Text2VLM provides a scalable tool for comprehensive safety assessment, contributing to the development of more robust safety mechanisms for VLMs. By enhancing the evaluation of multimodal vulnerabilities, Text2VLM plays a role in advancing the safe deployment of VLMs in diverse, real-world applications.

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

TL;DR

Abstract

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)