Table of Contents
Fetching ...

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

Rashid Mushkani, Shravan Nayak, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand

TL;DR

LIVS addresses the challenge of aligning text-to-image outputs with diverse, local community values in urban spaces by proposing a pluralistic, intersectional alignment framework. It crowdsources multi-criteria feedback from 30 community organizations to distill 634 concepts into six criteria (Accessibility, Safety, Comfort, Invitingness, Inclusivity, Diversity) and collects 37,710 high-quality annotations to fine-tune Stable Diffusion XL via Direct Preference Optimization. The results show moderate gains in alignment under certain criteria but reveal substantial neutral ratings, underscoring the complexity of reconciling diverse stakeholder preferences and the limits of single-objective optimization. The work demonstrates that population-specific, multi-criteria feedback can steer generative outputs toward locally meaningful designs and provides a benchmark and roadmap for context-aware, inclusive T2I alignment in spatial design and beyond.

Abstract

We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year participatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity - derived from 634 community-defined concepts. Using Direct Preference Optimization (DPO), we fine-tune Stable Diffusion XL to reflect multi-criteria spatial preferences and evaluate the LIVS dataset and the fine-tuned model through four case studies: (1) DPO increases alignment with annotated preferences, particularly when annotation volume is high; (2) preference patterns vary across participant identities, underscoring the need for intersectional data; (3) human-authored prompts generate more distinctive visual outputs than LLM-generated ones, influencing annotation decisiveness; and (4) intersectional groups assign systematically different ratings across criteria, revealing the limitations of single-objective alignment. While DPO improves alignment under specific conditions, the prevalence of neutral ratings indicates that community values are heterogeneous and often ambiguous. LIVS provides a benchmark for developing T2I models that incorporate local, stakeholder-driven preferences, offering a foundation for context-aware alignment in spatial design.

LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces

TL;DR

LIVS addresses the challenge of aligning text-to-image outputs with diverse, local community values in urban spaces by proposing a pluralistic, intersectional alignment framework. It crowdsources multi-criteria feedback from 30 community organizations to distill 634 concepts into six criteria (Accessibility, Safety, Comfort, Invitingness, Inclusivity, Diversity) and collects 37,710 high-quality annotations to fine-tune Stable Diffusion XL via Direct Preference Optimization. The results show moderate gains in alignment under certain criteria but reveal substantial neutral ratings, underscoring the complexity of reconciling diverse stakeholder preferences and the limits of single-objective optimization. The work demonstrates that population-specific, multi-criteria feedback can steer generative outputs toward locally meaningful designs and provides a benchmark and roadmap for context-aware, inclusive T2I alignment in spatial design and beyond.

Abstract

We introduce the Local Intersectional Visual Spaces (LIVS) dataset, a benchmark for multi-criteria alignment, developed through a two-year participatory process with 30 community organizations to support the pluralistic alignment of text-to-image (T2I) models in inclusive urban planning. The dataset encodes 37,710 pairwise comparisons across 13,462 images, structured along six criteria - Accessibility, Safety, Comfort, Invitingness, Inclusivity, and Diversity - derived from 634 community-defined concepts. Using Direct Preference Optimization (DPO), we fine-tune Stable Diffusion XL to reflect multi-criteria spatial preferences and evaluate the LIVS dataset and the fine-tuned model through four case studies: (1) DPO increases alignment with annotated preferences, particularly when annotation volume is high; (2) preference patterns vary across participant identities, underscoring the need for intersectional data; (3) human-authored prompts generate more distinctive visual outputs than LLM-generated ones, influencing annotation decisiveness; and (4) intersectional groups assign systematically different ratings across criteria, revealing the limitations of single-objective alignment. While DPO improves alignment under specific conditions, the prevalence of neutral ratings indicates that community values are heterogeneous and often ambiguous. LIVS provides a benchmark for developing T2I models that incorporate local, stakeholder-driven preferences, offering a foundation for context-aware alignment in spatial design.

Paper Structure

This paper contains 60 sections, 20 figures, 1 algorithm.

Figures (20)

  • Figure 1: Distribution of participants’ self-declared demographics. This figure summarizes the demographic profiles (e.g., age, gender, race/ethnicity, and disability) of the individuals who participated in workshops and annotation activities negotiativealignment.
  • Figure 2: Distilling the initial concepts into six core criteria. The figure shows how 634 distinct ideas were iteratively merged, discussed, and ranked to arrive at final high-level categories.
  • Figure 3: Word cloud depicting the frequency of various concepts within the 440 collected prompts. The size of each word reflects its prevalence, highlighting key themes such as public-space typologies, amenities, and contextual use scenarios. This distribution underscores the diversity and contextual comprehensiveness of the prompt dataset.
  • Figure 4: Distribution of annotation frequencies for each criterion after data cleaning. Red indicates neutral or equal preferences, while blue represents distinct preferences.
  • Figure 5: Criteria distribution on the new evaluation dataset. Neutral ratings were more common for Inclusivity and Diversity, indicating subtler or more subjective distinctions in these categories.
  • ...and 15 more figures