Structured Generations: Using Hierarchical Clusters to guide Diffusion Models
Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt
TL;DR
The paper addresses the challenge of generating high-fidelity images that reflect hierarchical clusters learned by latent trees. It introduces Diffuse-TreeVAE, a two-stage framework that uses a CNN-based TreeVAE to learn a latent tree with leaves $\\mathbb{L}$ representing clusters and a cluster-conditioned DDPM to refine leaf reconstructions $\\hat{x}_0^{(l)}$ via leaf index $l$. Key contributions include architectural enhancements to TreeVAE with CNNs and residuals, the integration of a DDPM conditioned on cluster representations to yield leaf-specific images without perturbing the clustering, and strong empirical gains (lower FID) across MNIST, FashionMNIST, and CIFAR-10. The work demonstrates that coupling hierarchical clustering with diffusion-based refinement can produce sharp, representative samples and broadens the applicability of cluster-aware generative modeling.
Abstract
This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.
