Linear combinations of latents in generative models: subspaces and beyond
Erik Bodin, Alexandru Stere, Dragos D. Margineantu, Carl Henrik Ek, Henry Moss
TL;DR
The paper tackles the challenge of controllably manipulating latent variables in generative models by proposing Latent Optimal Linear combinations (LOL), a closed-form transform that maps any linear combination of seed latents to the target latent distribution via a Monge optimal transport map. LOL enables robust interpolation, centroid computation, and the construction of expressive low-dimensional latent subspaces in a model-agnostic fashion, contingent on seed latents passing distribution tests that assess their alignment with the latent prior. Empirically, LOL outperforms or matches baseline interpolation methods in semantic preservation with significantly faster runtimes and demonstrates model-agnostic subspace construction across diffusion and flow-matching models. The work highlights the importance of distributional compatibility of seed latents and opens avenues for tailoring distribution tests and extending LOL to broader latent distributions.
Abstract
Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalising Flows have shown effectiveness across various modalities, and rely on latent variables for generation. For experimental design or creative applications that require more control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Latent Optimal Linear combinations (LOL) as a general-purpose method to form linear combinations of latent variables that adhere to the assumptions of the generative model. As LOL is easy to implement and naturally addresses the broader task of forming any linear combinations, e.g. the construction of subspaces of the latent space, LOL dramatically simplifies the creation of expressive low-dimensional representations of high-dimensional objects.
