Table of Contents
Fetching ...

Seeing Twice: How Side-by-Side T2I Comparison Changes Auditing Strategies

Matheus Kunzler Maldaner, Wesley Hanwen Deng, Jason I. Hong, Kenneth Holstein, Motahhare Eslami

TL;DR

Seeing Twice introduces MIRAGE, a web-based tool for contrast-first auditing of text-to-image generation. By displaying up to four generators side-by-side and prompting reflection on prompts, MIRAGE changes how users perceive outputs and identify biases. The study shows that side-by-side comparisons shift attention from individual images to distribution-level patterns and reveals language-fidelity gaps across prompts in different languages. The work proposes future directions for larger controlled studies, anonymous auditing, and community feedback loops to enhance AI transparency.

Abstract

While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and utility. A small but growing line of research has explored tools and processes to better engage non-AI expert users in auditing generative AI systems. In this work, we present the design and evaluation of MIRAGE, a web-based tool exploring a "contrast-first" workflow that allows users to pick up to four different text-to-image (T2I) models, view their images side-by-side, and provide feedback on model performance on a single screen. In our user study with fifteen participants, we used four predefined models for consistency, with only a single model initially being shown. We found that most participants shifted from analyzing individual images to general model output patterns once the side-by-side step appeared with all four models; several participants coined persistent "model personalities" (e.g., cartoonish, saturated) that helped them form expectations about how each model would behave on future prompts. Bilingual participants also surfaced a language-fidelity gap, as English prompts produced more accurate images than Portuguese or Chinese, an issue often overlooked when dealing with a single model. These findings suggest that simple comparative interfaces can accelerate bias discovery and reshape how people think about generative models.

Seeing Twice: How Side-by-Side T2I Comparison Changes Auditing Strategies

TL;DR

Seeing Twice introduces MIRAGE, a web-based tool for contrast-first auditing of text-to-image generation. By displaying up to four generators side-by-side and prompting reflection on prompts, MIRAGE changes how users perceive outputs and identify biases. The study shows that side-by-side comparisons shift attention from individual images to distribution-level patterns and reveals language-fidelity gaps across prompts in different languages. The work proposes future directions for larger controlled studies, anonymous auditing, and community feedback loops to enhance AI transparency.

Abstract

While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and utility. A small but growing line of research has explored tools and processes to better engage non-AI expert users in auditing generative AI systems. In this work, we present the design and evaluation of MIRAGE, a web-based tool exploring a "contrast-first" workflow that allows users to pick up to four different text-to-image (T2I) models, view their images side-by-side, and provide feedback on model performance on a single screen. In our user study with fifteen participants, we used four predefined models for consistency, with only a single model initially being shown. We found that most participants shifted from analyzing individual images to general model output patterns once the side-by-side step appeared with all four models; several participants coined persistent "model personalities" (e.g., cartoonish, saturated) that helped them form expectations about how each model would behave on future prompts. Bilingual participants also surfaced a language-fidelity gap, as English prompts produced more accurate images than Portuguese or Chinese, an issue often overlooked when dealing with a single model. These findings suggest that simple comparative interfaces can accelerate bias discovery and reshape how people think about generative models.

Paper Structure

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: MIRAGE User Study Workflow
  • Figure 2: MIRAGE Technical Implementation.
  • Figure 3: MIRAGE Main Page. Image Generator (upper-left) lets users type a prompt, toggle the “Study Version” option for the user study scenario, and pick up to four models from a total of thirty thumbnail icons. Hovering over a model opens the Model Preview panel (upper-right), which shows example images. The upper-right panel is also populated with reflection questions during the study workflow. Pressing Generate fills the Generation Results area with eight images per model, arranged in four parallel columns, so users can compare thirty-two outputs at a glance.
  • Figure 4: MIRAGE Landing Page. Users can read the purpose of the tool, select a language and read previously published work.
  • Figure 5: MIRAGE Instructions Page. Users are shown a simple three-step approach to using the tool.