Table of Contents
Fetching ...

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-Hwa Kim, Seungryong Kim

TL;DR

This work tackles geometric inconsistencies in diffusion-based text-to-3D generation, notably the Janus problem, by introducing Geometry-aware Score Distillation (GSD). GSD integrates 3D consistent noising, geometry-based gradient warping, and a correspondence-aware gradient consistency loss to enforce multiview coherence in SDS gradients. The approach is plug-and-play and preserves Gaussian noise properties while leveraging 3D Gaussian Splatting representations, yielding faster convergence and more faithful 3D geometries across multiple baselines. Experiments demonstrate improved geometric fidelity, reduced artifacts, and competitive results against state-of-the-art models without additional training data or modules.

Abstract

Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/.

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

TL;DR

This work tackles geometric inconsistencies in diffusion-based text-to-3D generation, notably the Janus problem, by introducing Geometry-aware Score Distillation (GSD). GSD integrates 3D consistent noising, geometry-based gradient warping, and a correspondence-aware gradient consistency loss to enforce multiview coherence in SDS gradients. The approach is plug-and-play and preserves Gaussian noise properties while leveraging 3D Gaussian Splatting representations, yielding faster convergence and more faithful 3D geometries across multiple baselines. Experiments demonstrate improved geometric fidelity, reduced artifacts, and competitive results against state-of-the-art models without additional training data or modules.

Abstract

Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/.

Paper Structure

This paper contains 27 sections, 13 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Teaser. Our framework incorporates 3D awareness into the score distillation sampling (SDS) process through a 3D consistent noising, which induces consistency of the predicted 2D score. As a general plug-and-play module that can be attached to any SDS-based text-to-3D generation baselines with little computation cost, it brings about highly enhanced view consistency and fidelity to 3D generation results across various baselines.
  • Figure 2: Overall framework. Our framework consists of three components for geometry-aware score distillation: 3D consistent noising, geometry-based gradient warping, and gradient consistency modeling. Through these components, our framework encourages multiview consistency between predicted 2D scores and enhances the quality of generated 3D scenes.
  • Figure 3: PAAS-based illustration of our consistent noising. Introduction of 3D-consistent noising induces more consistent SDS gradient across nearby viewpoints, whose enhanced consistency allows for coherent geometry.
  • Figure 4: 3D consistent $\int$-noising. To produce a 3D geometry-aware 2D noise map that preserves the properties of the standard Gaussian distribution, we conduct 3D conditional upsampling of point clouds and discrete integral of projected noise values. Please refer to Sec. \ref{['method: noising']} for more detailed explanation of the subfigures.
  • Figure 5: Properties of 3D consistent noise. Our 3D consistent integral noising preserves the properties of perfect standard Gaussian distribution that random noise (a) displays, while also demonstrating interpolative qualities that bilinear interpolation (b) possesses.
  • ...and 7 more figures