SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation

Xin-Yang Zheng; Yang Liu; Peng-Shuai Wang; Xin Tong

SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation

Xin-Yang Zheng, Yang Liu, Peng-Shuai Wang, Xin Tong

TL;DR

SDF-StyleGAN extends StyleGAN2 to 3D by using a grid-based implicit SDF representation and dual discriminators (global and local) that operate on SDF values and gradients to improve geometry and rendering quality. The approach is evaluated with a shading-image FID metric, achieving state-of-the-art results across ShapeNet categories and enabling tasks such as shape reconstruction, completion, and single-image 3D generation via GAN inversion. The work demonstrates that gradient-informed, implicit SDF supervision coupled with a 3D StyleGAN can produce smoother, more complete shapes and offers a versatile platform for 3D shape editing and interpolation, while acknowledging limitations in capturing very thin parts and proposing future directions such as truncated SDFs and 3D scene generation.

Abstract

We present a StyleGAN2-based deep learning approach for 3D shape generation, called SDF-StyleGAN, with the aim of reducing visual and geometric dissimilarity between generated shapes and a shape collection. We extend StyleGAN2 to 3D generation and utilize the implicit signed distance function (SDF) as the 3D shape representation, and introduce two novel global and local shape discriminators that distinguish real and fake SDF values and gradients to significantly improve shape geometry and visual quality. We further complement the evaluation metrics of 3D generative models with the shading-image-based Fréchet inception distance (FID) scores to better assess visual quality and shape distribution of the generated shapes. Experiments on shape generation demonstrate the superior performance of SDF-StyleGAN over the state-of-the-art. We further demonstrate the efficacy of SDF-StyleGAN in various tasks based on GAN inversion, including shape reconstruction, shape completion from partial point clouds, single-view image-based shape generation, and shape style editing. Extensive ablation studies justify the efficacy of our framework design. Our code and trained models are available at https://github.com/Zhengxinyang/SDF-StyleGAN.

SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation

TL;DR

Abstract

Paper Structure (20 sections, 9 equations, 15 figures, 4 tables)

This paper contains 20 sections, 9 equations, 15 figures, 4 tables.

Introduction
Related Work
3D generative models
Evaluation metrics for 3D generation
Design of SDF-StyleGAN
Overview
Feature-volume-based implicit signed distance functions
SDF-StyleGAN generator
SDF-StyleGAN discriminator
Global discriminator
Local discriminator
SDF-StyleGAN training
Loss functions
Adaptive training scheme
Experiments and Evaluation
...and 5 more sections

Figures (15)

Figure 1: Overview of SDF-StyleGAN. The original StyleGAN2 generator is extended to 3D, and it outputs the feature volume in a unit box. The feature vector $\phi(\mx)$ at any point $\mx$ inside the volume is interpolated via trilinear interpolation and is mapped to SDF value via a shallow MLP. The global discriminator takes the SDF values and gradients sampled at the grid centers as input, and the local discriminator takes the SDF values and gradients at a local and random 3D box region near the surface as input. A few local box regions are illustrated on the above chair example, in different colors.
Figure 2: (a): The revised StyleGAN2 generator for 3D feature volume generation. We used 3D convolution with kernel size 3 and four style blocks corresponding to four-level resolution, up to $32\times32\times 32$. Mod and Demod are the modulation and demodulation modules adapted from StyleGAN2. (b): The skip input for the generator. (c) The discriminator architecture. The tFeature module and the fFeature module convert between the feature volume per grid cell and the high dimensional feature to/from 3D convolution. Up and Down denote the upsampling and downsampling modules. The first block in (c) is removed from the local discriminator as its input feature grid resolution is $16\times16\times16$.
Figure 3: Illustration of the drawback of LFD. The number above the shape is the LFD between the shape and its GT counterpart.
Figure 4: Left: configuration of rendering views. 20 camera positions are illustrated as red points. Right: 20 rendered images for computing FID, the image resolution is $299\times299$.
Figure 5: Visual comparison of randomly generated chairs by different methods. The shapes in the last row are randomly sampled from the training dataset.
...and 10 more figures

SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation

TL;DR

Abstract

SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D Shape Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)