Table of Contents
Fetching ...

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

Zhenwei Wang, Tengfei Wang, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

TL;DR

ThemeStation tackles the challenge of generating theme-consistent 3D asset galleries from a few exemplars by introducing a two-stage workflow that first produces theme-driven concept images and then lifts them into 3D models. A novel Dual Score Distillation loss jointly leverages a concept prior and a reference prior, applied at high and low diffusion noise levels respectively, to balance global structure with fine details. Through extensive benchmarks and a user study, the approach demonstrates improved theme coherence, multi-view quality, and diversity over state-of-the-art image-to-3D and 3D-to-3D methods. The method enables controllable 3D-to-3D generation and holds promise for scalable, theme-specific content creation in VR and gaming, albeit with relatively long optimization times per model.

Abstract

Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

TL;DR

ThemeStation tackles the challenge of generating theme-consistent 3D asset galleries from a few exemplars by introducing a two-stage workflow that first produces theme-driven concept images and then lifts them into 3D models. A novel Dual Score Distillation loss jointly leverages a concept prior and a reference prior, applied at high and low diffusion noise levels respectively, to balance global structure with fine details. Through extensive benchmarks and a user study, the approach demonstrates improved theme coherence, multi-view quality, and diversity over state-of-the-art image-to-3D and 3D-to-3D methods. The method enables controllable 3D-to-3D generation and holds promise for scalable, theme-specific content creation in VR and gaming, albeit with relatively long optimization times per model.

Abstract

Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.
Paper Structure (30 sections, 5 equations, 10 figures, 4 tables)

This paper contains 30 sections, 5 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Overview of ThemeStation. Given just one or a few reference models, our approach can generate theme-consistent 3D models in two stages. In the first stage, we fine-tune a pre-trained text-to-image (T2I) diffusion model to form a customized theme-driven diffusion model that produces various concept images. In the second stage, we conduct reference-informed 3D asset modeling by progressively optimizing a rough initial model (omitted in this figure for brevity), which is obtained using an off-the-shelf image-to-3D method given the concept image, into a final 3D asset. We use a novel dual score distillation (DSD) loss for optimization, which applies concept prior and reference prior at different noise levels (denoising timesteps).
  • Figure 2: Comparison of the key ideas between image style transfer (top) and our dual score distillation (bottom). Images are from Gatys et al. gatys2016image (top) and Dibia DenoisingSteps2022 (bottom).
  • Figure 3: Results of the user study. We compare our method with seven baseline methods using 2AFC pairwise comparisons. All preferences are statistically significant ($p<0.05$, chi-squared test).
  • Figure 4: Qualitative comparisons with five image-to-3D methods to evaluate our second stage that lifts a concept image to a 3D model. We show the frontal view as primary for the first line and show the back view as primary for the last two lines.
  • Figure 5: Qualitative comparisons with two 3D variation methods to evaluate the overall generative diversity and quality of our method. For each case, we show three generated 3D models.
  • ...and 5 more figures