Table of Contents
Fetching ...

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models

Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt

TL;DR

The paper addresses the challenge of generating high-fidelity images that reflect hierarchical clusters learned by latent trees. It introduces Diffuse-TreeVAE, a two-stage framework that uses a CNN-based TreeVAE to learn a latent tree with leaves $\\mathbb{L}$ representing clusters and a cluster-conditioned DDPM to refine leaf reconstructions $\\hat{x}_0^{(l)}$ via leaf index $l$. Key contributions include architectural enhancements to TreeVAE with CNNs and residuals, the integration of a DDPM conditioned on cluster representations to yield leaf-specific images without perturbing the clustering, and strong empirical gains (lower FID) across MNIST, FashionMNIST, and CIFAR-10. The work demonstrates that coupling hierarchical clustering with diffusion-based refinement can produce sharp, representative samples and broadens the applicability of cluster-aware generative modeling.

Abstract

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models

TL;DR

The paper addresses the challenge of generating high-fidelity images that reflect hierarchical clusters learned by latent trees. It introduces Diffuse-TreeVAE, a two-stage framework that uses a CNN-based TreeVAE to learn a latent tree with leaves representing clusters and a cluster-conditioned DDPM to refine leaf reconstructions via leaf index . Key contributions include architectural enhancements to TreeVAE with CNNs and residuals, the integration of a DDPM conditioned on cluster representations to yield leaf-specific images without perturbing the clustering, and strong empirical gains (lower FID) across MNIST, FashionMNIST, and CIFAR-10. The work demonstrates that coupling hierarchical clustering with diffusion-based refinement can produce sharp, representative samples and broadens the applicability of cluster-aware generative modeling.

Abstract

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.
Paper Structure (7 sections, 1 equation, 7 figures, 2 tables)

This paper contains 7 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Schematic overview of the Diffuse-TreeVAE model: The reverse model of the DDPM (bottom) is conditioned on both the reconstruction and the index of the selected leaf $l$ obtained from the associated, pre-trained TreeVAE. The denoising function of the DDPM learns to refine the TreeVAE-based reconstructions.
  • Figure 2: (Top) Samples from the CIFAR-10 test set. (Middle) Reconstructions from the CNN-TreeVAE model. (Bottom) Refined reconstructions from the Diffuse-TreeVAE model, conditioned on the CNN-TreeVAE reconstructions and the corresponding leaves.
  • Figure 3: Diffuse-TreeVAE model trained on FashionMNIST. For each cluster, random newly generated images are displayed. Below each set of images, a normalized histogram (ranging from 0 to 1) shows the distribution of predicted classes from an independent, pre-trained classifier on FashionMNIST for all newly generated images in each leaf with a significant probability of reaching that leaf.
  • Figure 4: Diffuse-TreeVAE model trained on CIFAR-10. For each cluster, random newly generated images are displayed. Below each set of images, a normalized histogram (ranging from 0 to 1) shows the distribution of predicted classes from an independent, pre-trained classifier on CIFAR-10 for all newly generated images in each leaf with a significant probability of reaching that leaf.
  • Figure 5: Image generations from each leaf of (top) a CNN-TreeVAE, (middle) a cluster-unconditional Diffuse-TreeVAE, and (bottom) a cluster-conditional Diffuse-TreeVAE, all trained on CIFAR-10. Each row displays the generated images from all leaves of the specified model, starting with the same sample from the root. The corresponding leaf probabilities are shown at the top of the image and are by design the same for all models.
  • ...and 2 more figures