Table of Contents
Fetching ...

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han

TL;DR

UDiFF addresses the challenge of generating diverse 3D shapes with open surfaces by learning a data-driven optimal wavelet transform to express unsigned distance fields (UDFs) in a compact spatial-frequency space. A diffusion model then operates on coarse and fine wavelet coefficient volumes, with text-guided generation realized through cross-attention to CLIP embeddings and a dedicated fine predictor, followed by surface extraction via DCUDF and texture synthesis with Text2Tex. The key contributions are the data-driven wavelet optimization that minimizes information loss near the zero-level set, the conditional diffusion framework for UDFs, and robust meshing and texturing pipelines, demonstrated on open-surface DeepFashion3D and closed-surface ShapeNet benchmarks with strong qualitative and quantitative results. This approach enables high-fidelity, textured 3D content generation that accommodates open surfaces, broadening the scope of diffusion-based 3D modeling for real-world content creation.

Abstract

Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this work, we present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally. Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation. Specifically, instead of selecting an appropriate wavelet transformation which requires expensive manual efforts and still leads to large information loss, we propose a data-driven approach to learn the optimal wavelet transformation for UDFs. We evaluate UDiFF to show our advantages by numerical and visual comparisons with the latest methods on widely used benchmarks. Page: https://weiqi-zhang.github.io/UDiFF.

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

TL;DR

UDiFF addresses the challenge of generating diverse 3D shapes with open surfaces by learning a data-driven optimal wavelet transform to express unsigned distance fields (UDFs) in a compact spatial-frequency space. A diffusion model then operates on coarse and fine wavelet coefficient volumes, with text-guided generation realized through cross-attention to CLIP embeddings and a dedicated fine predictor, followed by surface extraction via DCUDF and texture synthesis with Text2Tex. The key contributions are the data-driven wavelet optimization that minimizes information loss near the zero-level set, the conditional diffusion framework for UDFs, and robust meshing and texturing pipelines, demonstrated on open-surface DeepFashion3D and closed-surface ShapeNet benchmarks with strong qualitative and quantitative results. This approach enables high-fidelity, textured 3D content generation that accommodates open surfaces, broadening the scope of diffusion-based 3D modeling for real-world content creation.

Abstract

Diffusion models have shown remarkable results for image generation, editing and inpainting. Recent works explore diffusion models for 3D shape generation with neural implicit functions, i.e., signed distance function and occupancy function. However, they are limited to shapes with closed surfaces, which prevents them from generating diverse 3D real-world contents containing open surfaces. In this work, we present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally. Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation. Specifically, instead of selecting an appropriate wavelet transformation which requires expensive manual efforts and still leads to large information loss, we propose a data-driven approach to learn the optimal wavelet transformation for UDFs. We evaluate UDiFF to show our advantages by numerical and visual comparisons with the latest methods on widely used benchmarks. Page: https://weiqi-zhang.github.io/UDiFF.
Paper Structure (15 sections, 5 equations, 12 figures, 4 tables)

This paper contains 15 sections, 5 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Diverse shapes with and without open surfaces generated by our UDiFF model. Top-Left: Conditional generation of clothes with prompts 'A short-sleeved dress in spiderman style', 'A Batman upper with long sleeves', 'A superman pant', 'A camouflage slip dress'. Around: A shape gallery generated by UDiFF conditionally and unconditionally.
  • Figure 2: Overview of UDiFF. (a) We propose a data-driven approach to attain the optimal wavelet transformation for UDF generation. We optimize wavelet filter parameters through the decomposition and inversion by minimizing errors in UDF self-reconstruction. (b) We fix the learned decomposition wavelet parameters and leverage it to prepare the data as a compact representation of UDFs including pairs of coarse and fine coefficient volumes. (c) is the architecture of the generator in diffusion models, where text conditions are introduced with cross-attentions. (d) The diffusion process of UDiFF. We train the generator to produce coarse coefficient volumes from random noises guided by input texts and train the fine predictor to predict fine coefficient volumes from the coarse ones. Follow the green arrows for inference, we start from a random noise and an input text to leverage the trained generator to produce a coarse coefficient volume. The trained fine predictor then predicts the fine coefficient volume. Together with the coarse one, we recover the UDFs with the fixed pre-optimized inversion wavelet filter parameters. Finally, we extract surfaces from UDFs and further texture them with the guiding text.
  • Figure 3: Comparisons of reconstructions with different wavelet filters. (a) The input shapes from DeepFashion3D zhu2020deep and ShapeNet chang2015shapenet, from where we sample UDFs to prepare compact wavelet representations. (b) The surfaces extracted from the recovered UDF with decomposition and inversion by our learned wavelet filter. (c,d) The surfaces extracted from the recovered UDF with manual chosen wavelet filters.
  • Figure 4: Visual comparison with state-of-the-arts on the generated shapes under DeepFashion3D dataset. The front and back faces are rendered with different colors for a clear distinguish on open surfaces.
  • Figure 5: Conditional generations produced by UDiFF and Shap$\cdot$E. The front and back faces are rendered with different colors for a clear distinguish on open surfaces.
  • ...and 7 more figures