Table of Contents
Fetching ...

Color encoding in Latent Space of Stable Diffusion Models

Guillem Arias, Ariadna Solà, Martí Armengod, Maria Vanrell

TL;DR

The paper probes how color and shape are encoded in Stable Diffusion's latent space by analyzing the VAE-derived latent representations with controlled color and grayscale datasets and applying PCA and ablation experiments. It identifies a partially disentangled, efficient coding where color is carried by opponent axes in channels c3/c4 and intensity/shape by channels c1/c2, with PC1 linked to intensity and PC2/PC3 capturing hue directions. These findings illuminate the latent-space structure, enabling more precise color editing and guiding the development of more disentangled diffusion models. Overall, the work provides a framework for understanding and manipulating perceptual attributes in latent-diffusion systems.

Abstract

Recent advances in diffusion-based generative models have achieved remarkable visual fidelity, yet a detailed understanding of how specific perceptual attributes - such as color and shape - are internally represented remains limited. This work explores how color is encoded in a generative model through a systematic analysis of the latent representations in Stable Diffusion. Through controlled synthetic datasets, principal component analysis (PCA) and similarity metrics, we reveal that color information is encoded along circular, opponent axes predominantly captured in latent channels c_3 and c_4, whereas intensity and shape are primarily represented in channels c_1 and c_2. Our findings indicate that the latent space of Stable Diffusion exhibits an interpretable structure aligned with a efficient coding representation. These insights provide a foundation for future work in model understanding, editing applications, and the design of more disentangled generative frameworks.

Color encoding in Latent Space of Stable Diffusion Models

TL;DR

The paper probes how color and shape are encoded in Stable Diffusion's latent space by analyzing the VAE-derived latent representations with controlled color and grayscale datasets and applying PCA and ablation experiments. It identifies a partially disentangled, efficient coding where color is carried by opponent axes in channels c3/c4 and intensity/shape by channels c1/c2, with PC1 linked to intensity and PC2/PC3 capturing hue directions. These findings illuminate the latent-space structure, enabling more precise color editing and guiding the development of more disentangled diffusion models. Overall, the work provides a framework for understanding and manipulating perceptual attributes in latent-diffusion systems.

Abstract

Recent advances in diffusion-based generative models have achieved remarkable visual fidelity, yet a detailed understanding of how specific perceptual attributes - such as color and shape - are internally represented remains limited. This work explores how color is encoded in a generative model through a systematic analysis of the latent representations in Stable Diffusion. Through controlled synthetic datasets, principal component analysis (PCA) and similarity metrics, we reveal that color information is encoded along circular, opponent axes predominantly captured in latent channels c_3 and c_4, whereas intensity and shape are primarily represented in channels c_1 and c_2. Our findings indicate that the latent space of Stable Diffusion exhibits an interpretable structure aligned with a efficient coding representation. These insights provide a foundation for future work in model understanding, editing applications, and the design of more disentangled generative frameworks.

Paper Structure

This paper contains 1 section, 1 figure.

Table of Contents

  1. 1. Introduction

Figures (1)

  • Figure 1: Images generated by Stable-Diffusion for the given prompts formed by unnatural color-object pairs that probably were not part of the training datasets.