Table of Contents
Fetching ...

SCALEX: Scalable Concept and Latent Exploration for Diffusion Models

E. Zhixuan Zeng, Yuhao Chen, Alexander Wong

TL;DR

SCALEX tackles the challenge of scalable bias analysis in diffusion models by linking internal latent directions to natural language prompts. It introduces prompt-aligned latents in H-space, stabilized with Latent Consistency Models, enabling zero-shot, training-free interpretation of concepts across hundreds of prompts. The framework supports automated bias analysis through defaults, descriptors, and clustering, and validates findings by image conditioning and open-ended discovery. The approach reveals gender and cultural associations and emergent semantic structure, providing scalable auditing and potential pathways to bias mitigation in diffusion-based generation.

Abstract

Image generation models frequently encode social biases, including stereotypes tied to gender, race, and profession. Existing methods for analyzing these biases in diffusion models either focus narrowly on predefined categories or depend on manual interpretation of latent directions. These constraints limit scalability and hinder the discovery of subtle or unanticipated patterns. We introduce SCALEX, a framework for scalable and automated exploration of diffusion model latent spaces. SCALEX extracts semantically meaningful directions from H-space using only natural language prompts, enabling zero-shot interpretation without retraining or labelling. This allows systematic comparison across arbitrary concepts and large-scale discovery of internal model associations. We show that SCALEX detects gender bias in profession prompts, ranks semantic alignment across identity descriptors, and reveals clustered conceptual structure without supervision. By linking prompts to latent directions directly, SCALEX makes bias analysis in diffusion models more scalable, interpretable, and extensible than prior approaches.

SCALEX: Scalable Concept and Latent Exploration for Diffusion Models

TL;DR

SCALEX tackles the challenge of scalable bias analysis in diffusion models by linking internal latent directions to natural language prompts. It introduces prompt-aligned latents in H-space, stabilized with Latent Consistency Models, enabling zero-shot, training-free interpretation of concepts across hundreds of prompts. The framework supports automated bias analysis through defaults, descriptors, and clustering, and validates findings by image conditioning and open-ended discovery. The approach reveals gender and cultural associations and emergent semantic structure, providing scalable auditing and potential pathways to bias mitigation in diffusion-based generation.

Abstract

Image generation models frequently encode social biases, including stereotypes tied to gender, race, and profession. Existing methods for analyzing these biases in diffusion models either focus narrowly on predefined categories or depend on manual interpretation of latent directions. These constraints limit scalability and hinder the discovery of subtle or unanticipated patterns. We introduce SCALEX, a framework for scalable and automated exploration of diffusion model latent spaces. SCALEX extracts semantically meaningful directions from H-space using only natural language prompts, enabling zero-shot interpretation without retraining or labelling. This allows systematic comparison across arbitrary concepts and large-scale discovery of internal model associations. We show that SCALEX detects gender bias in profession prompts, ranks semantic alignment across identity descriptors, and reveals clustered conceptual structure without supervision. By linking prompts to latent directions directly, SCALEX makes bias analysis in diffusion models more scalable, interpretable, and extensible than prior approaches.

Paper Structure

This paper contains 47 sections, 6 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Vectors are captured from the H-space of the model, while conditioned on some text prompt.
  • Figure 2: Overview of the three experimental approaches for analyzing H-space representations. (a) Identify "defaults" through one-to-one comparisons, isolate the impact of adding or removing a single concept. (b) Find detailed trait descriptors for categories through one-to-many comparisons, which rank multiple prompts along a semantic spectrum. (c) Clustering captures broader patterns and associations across diverse captions beyond predefined categories.
  • Figure 3: For each profession, we plot the average difference in male and female cosine distances for five prompt variants: female/male (baseline), woman/man, relative clause (a doctor who is a woman/man), pronouns (she/he is a doctor), first names (Sarah/John, a doctor), and honorifics (Ms./Mr. surname). Results are strongly correlated with the baseline female/male phrasing (Pearson $r = 0.92, 0.86, 0.84$ respectively for man/woman, relative clause, pronouns). First names ($r=0.85$) and honorifics ($r=0.72$) show positive but weaker alignment due to (i) additional priors like age or region embedded in names, and (ii) honorifics being weaker gender signals in caption distributions. See Supplementary for more detailed analysis of their differences.
  • Figure 4: Correlation between the percentage of images classified as female (using CLIP) and the difference in cosine distances between gendered H-space vectors for each prompt. Prompts with higher cosine differences favouring female vectors are more likely to generate female-presenting images, indicating a strong association between H-space distances and perceived gender.
  • Figure 5: t-SNE visualization of H-space vectors obtained from the Food500-CAP dataset. Clusters are labelled based on the id assigned by HDBSCAN mcinnesAcceleratedHierarchicalDensity2017. Some prominent clusters include (8) square dishes; (12) pots with soup; (27) sandwiches/bread; (44) pie; (82) seafood boil; (94) takeout boxes; (98) rectangular plates; (120) salads; (127) stir fried noodles; (136) purple vegetables, esp. purple cabbage
  • ...and 10 more figures