Table of Contents
Fetching ...

Masked Conditioning for Deep Generative Models

Phillip Mueller, Jannik Wiese, Sebastian Mueller, Lars Mikelsons

TL;DR

Engineering design datasets are often small, sparsely labeled, and contain mixed numerical and categorical conditioning, which hinders the use of deep generative models. The authors introduce masked conditioning to train DGMs on sparse conditioning by applying training-time masking with various sparsity schedules, and they embed categorical and numerical conditions for integration into VAEs and latent diffusion models. Empirical results on 2D point clouds and image datasets demonstrate data efficiency and show that small, domain-specific models can be paired with large pretrained priors to achieve higher-quality, controllable generation. The approach offers a practical pathway for deploying conditional generative models in resource-constrained engineering settings and suggests avenues for extending conditioning to more complex inputs and higher-dimensional conditioning.

Abstract

Datasets in engineering domains are often small, sparsely labeled, and contain numerical as well as categorical conditions. Additionally. computational resources are typically limited in practical applications which hinders the adoption of generative models for engineering tasks. We introduce a novel masked-conditioning approach, that enables generative models to work with sparse, mixed-type data. We mask conditions during training to simulate sparse conditions at inference time. For this purpose, we explore the use of various sparsity schedules that show different strengths and weaknesses. In addition, we introduce a flexible embedding that deals with categorical as well as numerical conditions. We integrate our method into an efficient variational autoencoder as well as a latent diffusion model and demonstrate the applicability of our approach on two engineering-related datasets of 2D point clouds and images. Finally, we show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality while retaining the controllability induced by our conditioning scheme.

Masked Conditioning for Deep Generative Models

TL;DR

Engineering design datasets are often small, sparsely labeled, and contain mixed numerical and categorical conditioning, which hinders the use of deep generative models. The authors introduce masked conditioning to train DGMs on sparse conditioning by applying training-time masking with various sparsity schedules, and they embed categorical and numerical conditions for integration into VAEs and latent diffusion models. Empirical results on 2D point clouds and image datasets demonstrate data efficiency and show that small, domain-specific models can be paired with large pretrained priors to achieve higher-quality, controllable generation. The approach offers a practical pathway for deploying conditional generative models in resource-constrained engineering settings and suggests avenues for extending conditioning to more complex inputs and higher-dimensional conditioning.

Abstract

Datasets in engineering domains are often small, sparsely labeled, and contain numerical as well as categorical conditions. Additionally. computational resources are typically limited in practical applications which hinders the adoption of generative models for engineering tasks. We introduce a novel masked-conditioning approach, that enables generative models to work with sparse, mixed-type data. We mask conditions during training to simulate sparse conditions at inference time. For this purpose, we explore the use of various sparsity schedules that show different strengths and weaknesses. In addition, we introduce a flexible embedding that deals with categorical as well as numerical conditions. We integrate our method into an efficient variational autoencoder as well as a latent diffusion model and demonstrate the applicability of our approach on two engineering-related datasets of 2D point clouds and images. Finally, we show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality while retaining the controllability induced by our conditioning scheme.

Paper Structure

This paper contains 13 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Left: Architecture of our masked conditioning approach, applied to a VAE. Right: Architecture of our masked conditioning approach applied to a diffusion model.
  • Figure 2: Left: MSE for the VAEs and LDMs trained on the GeoBIKED, vehicles and quality checked DVM-Car subset datasets for increasing levels of sparsity in the conditions. The sparsity levels are kept constant for each training run. Right: Mean MSE over sparsity levels over the number of samples in the training dataset (BIKED dataset).
  • Figure 3: Qualitative results of reconstructing images from the DVM-Car dataset. The mcLDM is conditioned with the same inputs as the ground truth image is labeled. For the refinement, the mcLDM-generated image is passed to the model as input, together with the prompt. Best viewed when zoomed in.
  • Figure 4: Qualitative results of reconstructing images from the GeoBiked dataset. The mcLDM is conditioned with the same inputs as the ground truth image is labeled. For the refinement, the mcLDM-generated image is passed to the model as input, together with the prompt. Best viewed when zoomed in.