DiffusionWorldViewer: Exposing and Broadening the Worldview Reflected by Generative Text-to-Image Models
Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson
TL;DR
Generative TTI systems encode a worldview from training data that may misalign with user perspectives. The authors introduce DiffusionWorldViewer, an interactive interface that exposes demographic distributions in TTI outputs and enables editing toward user-valued worldviews using semantic guidance, without retraining. They formalize a worldview framework based on CAPTA and ARROWS, and implement back-end and front-end components to surface, compare, and adjust outputs via four editing modes (parity, US demographics, absolute, relative). A user study with 18 diverse participants and two case studies show that the tool increases awareness of model biases, broadens representation, and supports task-dependent editing, while highlighting trade-offs and ethical considerations. The work lays a foundation for co-adaptive, user-aware customization of worldview in diffusion-based image synthesis and points to future work on expanding editing categories and multi-user composition of worldviews.
Abstract
Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. Like humans, TTI models have a worldview, a conception of the world learned from their training data and task that influences the images they generate for a given prompt. However, the worldviews of TTI models are often hidden from users, making it challenging for users to build intuition about TTI outputs, and they are often misaligned with users' worldviews, resulting in output images that do not match user expectations. In response, we introduce DiffusionWorldViewer, an interactive interface that exposes a TTI model's worldview across output demographics and provides editing tools for aligning output images with user perspectives. In a user study with 18 diverse TTI users, we find that DiffusionWorldViewer helps users represent their varied viewpoints in generated images and challenge the limited worldview reflected in current TTI models.
