Vipera: Blending Visual and LLM-Driven Guidance for Systematic Auditing of Text-to-Image Generative AI
Yanwei Huang, Wesley Hanwen Deng, Sijia Xiao, Motahhare Eslami, Jason I. Hong, Arpit Narechania, Adam Perer
TL;DR
Vipera addresses the challenge of auditing large-scale text-to-image models by integrating a scene-graph–driven visual sensemaking interface with LLM-powered guidance to systematically explore outputs. The authors validate the approach with formative work and a controlled user study (n=24), showing that multimodal guidance reduces cognitive load while enabling more thorough audits across diverse criteria. They report that visual cues and AI-driven suggestions are complementary, leading to higher quality insights and more efficient workflows than ablated versions. The work provides design implications for deploying visually guided, AI-assisted auditing tools in real-world responsible-AI practices and highlights avenues for personalization and broader applicability.
Abstract
Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we conducted formative studies with five AI auditors and synthesized five design goals for supporting systematic AI audits. Based on these insights, we developed Vipera, an interactive auditing interface that employs multiple visual cues including a scene graph to facilitate image sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, Vipera leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. Through a controlled experiment with 24 participants experienced in AI auditing, we demonstrate Vipera's effectiveness in helping auditors navigate large AI output spaces and organize their analyses while engaging with diverse criteria.
