TreeDiffusion: Hierarchical Generative Clustering for Conditional Diffusion

Jorge da Silva Gonçalves; Laura Manduchi; Moritz Vandenhirtz; Julia E. Vogt

TreeDiffusion: Hierarchical Generative Clustering for Conditional Diffusion

Jorge da Silva Gonçalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt

TL;DR

TreeDiffusion tackles the gap between clustering and high-fidelity image generation by conditioning diffusion on hierarchical latent representations learned by a TreeVAE. It introduces a two-stage pipeline where TreeVAE performs hierarchical clustering and a DDIM-based diffusion model, guided by a path encoder, generates cluster-specific images. Empirically, the approach improves generation quality (FID) across multiple datasets and preserves clear cluster structure in the output, outperforming a naive TreeVAE+Diffusion baseline. The method also enables interpretable visualizations of the learned latent hierarchy, highlighting the benefits of hierarchical conditioning for generative clustering.

Abstract

Generative modeling and clustering are conventionally distinct tasks in machine learning. Variational Autoencoders (VAEs) have been widely explored for their ability to integrate both, providing a framework for generative clustering. However, while VAEs can learn meaningful cluster representations in latent space, they often struggle to generate high-quality samples. This paper addresses this problem by introducing TreeDiffusion, a deep generative model that conditions diffusion models on learned latent hierarchical cluster representations from a VAE to obtain high-quality, cluster-specific generations. Our approach consists of two steps: first, a VAE-based clustering model learns a hierarchical latent representation of the data. Second, a cluster-aware diffusion model generates realistic images conditioned on the learned hierarchical structure. We systematically compare the generative capabilities of our approach with those of alternative conditioning strategies. Empirically, we demonstrate that conditioning diffusion models on hierarchical cluster representations improves the generative performance on real-world datasets compared to other approaches. Moreover, a key strength of our method lies in its ability to generate images that are both representative and specific to each cluster, enabling more detailed visualization of the learned latent structure. Our approach addresses the generative limitations of VAE-based clustering approaches by leveraging their learned structure, thereby advancing the field of generative clustering.

TreeDiffusion: Hierarchical Generative Clustering for Conditional Diffusion

TL;DR

Abstract

TreeDiffusion: Hierarchical Generative Clustering for Conditional Diffusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)