Trust in Vision-Language Models: Insights from a Participatory User Workshop
Agnese Chiatti, Lara Piccolo, Sara Bernardini, Matteo Matteucci, Viola Schiaffonati
TL;DR
This work probes how users form and calibrate trust in Vision-Language Models (VLMs) through a user-centered workshop with design and development experts, grounding trust concepts in a concrete use case and exploring interface representations like scene graphs. It adopts a two-part exploratory design—a collaborative game to study trust dynamics in situated reasoning and a mock-up evaluation to gather design requirements for a future evaluation tool—finding that language prompts and multimodal inputs influence trust, while graph-based representations offer interpretable support. The study yields design recommendations for trust assessment in VLMs, including prioritizing user agency, contextualizing trust metrics, and incorporating longitudinal, diverse, and context-specific studies, along with guidance on trust calibration and engagement strategies. The findings provide a foundation for more extensive, targeted user studies to derive principled design guidelines and inform regulatory framing for real-world VLM deployments.
Abstract
With the growing deployment of Vision-Language Models (VLMs), pre-trained on large image-text and video-text datasets, it is critical to equip users with the tools to discern when to trust these systems. However, examining how user trust in VLMs builds and evolves remains an open problem. This problem is exacerbated by the increasing reliance on AI models as judges for experimental validation, to bypass the cost and implications of running participatory design studies directly with users. Following a user-centred approach, this paper presents preliminary results from a workshop with prospective VLM users. Insights from this pilot workshop inform future studies aimed at contextualising trust metrics and strategies for participants' engagement to fit the case of user-VLM interaction.
