LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
Rashid Mushkani, Shravan Nayak, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand
TL;DR
LIVS addresses the challenge of aligning text-to-image outputs with diverse, local community values in urban spaces by proposing a pluralistic, intersectional alignment framework. It crowdsources multi-criteria feedback from 30 community organizations to distill 634 concepts into six criteria (Accessibility, Safety, Comfort, Invitingness, Inclusivity, Diversity) and collects 37,710 high-quality annotations to fine-tune Stable Diffusion XL via Direct Preference Optimization. The results show moderate gains in alignment under certain criteria but reveal substantial neutral ratings, underscoring the complexity of reconciling diverse stakeholder preferences and the limits of single-objective optimization. The work demonstrates that population-specific, multi-criteria feedback can steer generative outputs toward locally meaningful designs and provides a benchmark and roadmap for context-aware, inclusive T2I alignment in spatial design and beyond.
Abstract
We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year participatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity - derived from 634 community-defined concepts. Using Direct Preference Optimization (DPO), we fine-tune Stable Diffusion XL to reflect multi-criteria spatial preferences and evaluate the LIVS dataset and the fine-tuned model through four case studies: (1) DPO increases alignment with annotated preferences, particularly when annotation volume is high; (2) preference patterns vary across participant identities, underscoring the need for intersectional data; (3) human-authored prompts generate more distinctive visual outputs than LLM-generated ones, influencing annotation decisiveness; and (4) intersectional groups assign systematically different ratings across criteria, revealing the limitations of single-objective alignment. While DPO improves alignment under specific conditions, the prevalence of neutral ratings indicates that community values are heterogeneous and often ambiguous. LIVS provides a benchmark for developing T2I models that incorporate local, stakeholder-driven preferences, offering a foundation for context-aware alignment in spatial design.
