MIRAGE: Multi-model Interface for Reviewing and Auditing Generative Text-to-Image AI
Matheus Kunzler Maldaner, Wesley Hanwen Deng, Jason Hong, Ken Holstein, Motahhare Eslami
TL;DR
The paper addresses the challenge of detecting harmful biases in text-to-image (T2I) models by engaging everyday users in auditing outputs from multiple models. It introduces MIRAGE, a web-based interface that enables side-by-side comparison and structured audit reporting across four predefined T2I models. A preliminary study with five participants demonstrates that multi-model viewing surfaces biases that single-model reviews miss, and reveals new auditing strategies. The authors outline future directions, including anonymous auditing, a T2I model marketplace, and a leaderboard to integrate user feedback into model development, aiming to bridge the gap between users and developers in responsible AI tooling.
Abstract
While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and usability in different applications. Recent years have seen growing interest in engaging diverse AI users in auditing generative AI that might impact their lives. To this end, we propose MIRAGE as a web-based tool where AI users can compare outputs from multiple AI text-to-image (T2I) models by auditing AI-generated images, and report their findings in a structured way. We used MIRAGE to conduct a preliminary user study with five participants and found that MIRAGE users could leverage their own lived experiences and identities to surface previously unnoticed details around harmful biases when reviewing multiple T2I models' outputs compared to reviewing only one.
