Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

Maxwell Joseph Jacobson; Yexiang Xue

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

Maxwell Joseph Jacobson, Yexiang Xue

TL;DR

SPRING outperforms baseline generative models, excelling in delivering high design quality and better meeting user specifications and is adept at managing novel user specifications not encountered during its training, thanks to its proficiency in zero-shot constraint transfer.

Abstract

Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module samples the set of locations of objects to be generated from a backtrack-free distribution. This distribution modifies the implicit preference distribution, which is learned by a recursive neural network to capture utility and aesthetics. Sampling from the backtrack-free distribution is accomplished by a symbolic reasoning approach, SampleSearch, which zeros out the probability of sampling spatial locations violating explicit user specifications. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfies user requirements. Furthermore, SPRING offers interpretability, allowing users to visualize and diagnose the generation process through the bounding boxes. SPRING is also adept at managing novel user specifications not encountered during its training, thanks to its proficiency in zero-shot constraint transfer. Quantitative evaluations and a human study reveal that SPRING outperforms baseline generative models, excelling in delivering high design quality and better meeting user specifications.

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

TL;DR

Abstract

Paper Structure (56 sections, 1 theorem, 4 equations, 14 figures, 9 tables)

This paper contains 56 sections, 1 theorem, 4 equations, 14 figures, 9 tables.

Introduction
Problem Definition
Design Production
Propositional Design Language
Spatial Reasoning Integrated Generator
Perception Module
Object Detector for Prior Visual Element Recognition
Scene Encoder for General Visual Features
Spatial Reasoning Module (SRM)
Decisions in Iterative Refinement
The Implicit Preference & Backtrack-free Distributions
Parameter Sampling
Learning Implicit Preference
The SRM Satisfies Explicit Constraints
The SRM Respects Implicit Preferences
...and 41 more sections

Key Result

Proposition 1

The layouts sampled from SampleSearch procedure are from the backtrack-free distribution defined in Subsection sec:implicit_backtrack.

Figures (14)

Figure 1: An interior design generated by our proposed SPRING model (middle) with a given background already containing an oven and a sink among other objects (left). The user specifications are at the bottom (provided to SPRING in the form of propositional logic; natural language text is used here to aid readability). SPRING creates a design satisfying the specifications. Text-to-image approaches like Stable Diffusion (right) often fail to meet these constraints, mixing up the number, color, and placement of objects.
Figure 2: Rollout of the SPRING system. (First row) SPRING consists of the perception module, the spatial reasoning module, and the visual element generation module. (Second & third rows) The neural and symbolic integrated spatial reasoning module decides the bounding boxes of each object to be generated. It iteratively halves each coordinate of every bounding box until it is sufficiently small. Symbolic reasoning in the form of SampleSearch is applied to the output of the neural net to ensure satisfaction of user constraints. These bounding boxes are filled by the visual element generator in the last step. (Fourth row) Example pictures generated by SPRING demonstrate good quality designs satisfying user specifications in Figure \ref{['fig:functionality']}.
Figure 3: Example problem demonstrating the implicit preference and the backtrack-free distribution of one variable $x_2$. In the root node, $x_2$ can take values between 0 and 8. Such a range is iteratively refined in each step (three decisions: taking the left side (L), middle (M), right side (R)) with different probabilities, forming a tree. (Left) The implicit preference distribution defines how the next decision should be sampled from the current distribution. The probability is generated by the RNN. (Right) The backtrack-free distribution, which is proportional to the implicit preference distribution for all decisions that adhere to constraints, but have 0 probability when any are violated. Notice how distributions with new zeros in the first and third levels are renormalized, but still proportional.
Figure 4: Example from Figure \ref{['fig:srm_tree']} with the SampleSearch procedure overlaid. (1-7) denotes the 7 steps in the search. (1) Sample "R" from distribution. (2) No element of the range is less than 4, so the current node is pruned. Backtrack and normalize distribution. (3) Sample "L". (4) Sample "R". (5) Sample "R". (6) A leaf node has been reached, but it does not satisfy constraints. Backtrack and normalize. (7) Sample "M". This leaf node satisfies all the constraints and is returned.
Figure 5: Typical latent diffusion approach, as utilized by our Visual Element Generator. (A) An image can be generated conditioned on text and image inputs. These inputs are transformed to the latent space by CLIP clip and an autoencoder, respectively. (B) Diffusion starts with a pure noise vector and removes noise in small steps to form an image. This occurs entirely in the latent space by the U-net unet. (C) In the latent space, each de-noising step is a soft adjustment towards a space more familiar to the U-net. These "islands" are formed during training, and map to images that look good, and fit the conditioning data.
...and 9 more figures

Theorems & Definitions (1)

Proposition 1

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

TL;DR

Abstract

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (1)