Table of Contents
Fetching ...

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Mateusz Pach, Jessica Bader, Quentin Bouniot, Serge Belongie, Zeynep Akata

Abstract

Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.

The Latent Color Subspace: Emergent Order in High-Dimensional Chaos

Abstract

Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.
Paper Structure (33 sections, 21 equations, 12 figures, 6 tables)

This paper contains 33 sections, 21 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: We find a simple color subspace in the VAE embedding space of FLUX which can be interpreted as cylindrical coordinates corresponding to Hue, Saturation, and Lightness, enabling (1) inexpensive observation and (2) targeted intervention.
  • Figure 2: PCA shows color organization in the VAE latent space mirrors HSL: Hue forms a circle on the PC2–PC3 plane, Saturation is distance from the black-white axis, and Lightness lies on PC1.
  • Figure 3: Flow Matching introduces an additional layer of complexity to our interpretation, as latents traverse the space over timesteps to reach their final destination. (a) In the Latent Color Subspace (LCS), colors evolve over timesteps $t$, starting mixed at the center and gradually moving toward their final positions. Dots represent individual patches, indicated in their ultimate colors, while stars orient the space with known color locations at $t=50$. (b) Despite variation in individual patches, the expected relative position between colors stays consistent over timesteps in the LCS, but scaled with time. Shown on per-image averaged patches (circle) of 26 single-colored images.
  • Figure 4: The Latent Color Subspace (LCS) enables observation and intervention during generation. At intermediate timestep $t$, we project the mid-generated sample from the FM VAE latent space () into the LCS () obtaining coordinates $\mathbf{C}$ and rescaling them to $\hat{\mathbf{C}}$, which matches timestep $t=50$ statistics (). Type I intervention ($\hat{\mathbf{C}}'$) modifies color by shifting, scaling, rotating to match the lightness, saturation, and hue respectively, while Type II intervention ($\hat{\mathbf{C}}"$) directly shifts to adjust all three. The interventions are interpolated to get $\hat{\mathbf{C}}^\star$ and rescaled back to timestep $t$ ($\mathbf{C}^\star$). Finally $\mathbf{C}$ is replaced with $\mathbf{C}^\star$ in the latent of the generated sample. With a simple projection into the LCS and the correct scaling, we can directly observe color ($O_t$) without the computationally heavy VAE decoder.
  • Figure 5: With our mid-generation color observation method (top), we validate our interpretation of the Latent Color Subspace (LCS) by predicting the final colors at intermediate timesteps. We compare these predictions with the VAE-decoded latents (bottom).
  • ...and 7 more figures