Table of Contents
Fetching ...

Representation Learning for Sequential Volumetric Design Tasks

Md Ferdous Alam, Yi Wang, Chin-Yi Cheng, Jieliang Luo

TL;DR

This work addresses the challenge of learning latent representations for high-dimensional sequential volumetric design by leveraging transformer-based encoder–decoder architectures. It builds a dataset of thousands of expert-like sequential voxel designs generated via Building-Gym, learns latent representations, and couples them with a flow-based density estimator to enable two downstream tasks: design sequence preference evaluation and autoregressive autocompletion. Key contributions include a dual training regime (reconstruction and autoregressive), a density-based preference mechanism, and a sequential-FID evaluation for generated designs. The results show strong reconstruction accuracy and near 90% preference accuracy against random sequences, demonstrating the practical potential for AI-assisted sequential design and automated evaluation, with future work aimed at improving autocompletion realism and metric development.

Abstract

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process requires careful design decisions and iterative adjustments, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost $90\%$ accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.

Representation Learning for Sequential Volumetric Design Tasks

TL;DR

This work addresses the challenge of learning latent representations for high-dimensional sequential volumetric design by leveraging transformer-based encoder–decoder architectures. It builds a dataset of thousands of expert-like sequential voxel designs generated via Building-Gym, learns latent representations, and couples them with a flow-based density estimator to enable two downstream tasks: design sequence preference evaluation and autoregressive autocompletion. Key contributions include a dual training regime (reconstruction and autoregressive), a density-based preference mechanism, and a sequential-FID evaluation for generated designs. The results show strong reconstruction accuracy and near 90% preference accuracy against random sequences, demonstrating the practical potential for AI-assisted sequential design and automated evaluation, with future work aimed at improving autocompletion realism and metric development.

Abstract

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process requires careful design decisions and iterative adjustments, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.
Paper Structure (19 sections, 4 equations, 19 figures, 4 tables)

This paper contains 19 sections, 4 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: In this paper, we conduct representation learning for high-dimensional sequential volumetric design, and apply the learned representation for two downstream tasks: 1) preference evaluation over two design sequences (above) and 2) auto-completion with a partial design (below).
  • Figure 2: As the first critical step in professional building design, volumetric design takes into consideration of site condition, building code, and design rules (left). The outcome is a 3D structure where different colors represent different room types (right).
  • Figure 3: The pipeline to generate design embeddings: 1) our heuristic agent takes in a group of design constraints and outputs an action sequence for each design; 2) our Building-Gym environment takes in the action sequence and outputs a sequence of voxel-based design states, which are flattened into design embeddings.
  • Figure 4: Distribution of sequence lengths in our dataset
  • Figure 5: We develop an encoder-decoder based model for learning representations of sequential volumetric design data. The encoder portion consists of multi-head self-attention architectures. The output from the final self-attention layer is used to further train a flow-based model, e.g. real NVP for density estimation. Note that we keep the weights of the encoder frozen while training the flow model. During inference time, weights of both the encoder and the flow model are kept frozen to obtain the log-likelihood of a design sequence. Directly comparing this log-likelihood provides preference of design sequence $\sigma^i$ over design sequence $\sigma^j$.
  • ...and 14 more figures