DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Yuanze Lin, Ronald Clark, Philip Torr
TL;DR
DreamPolisher tackles the challenge of producing high-fidelity, view-consistent 3D assets from text prompts by marrying 3D Gaussian Splatting with geometric diffusion guidance and a ControlNet-based appearance refiner. The method uses a two-stage pipeline: a coarse Stage 1 initialized from a text-to-point diffusion prior to produce a geometry-robust 3D prior, followed by Stage 2 appearance refinement that conditions on camera information and scene coordinates to boost texture detail and cross-view coherence. A novel view-consistency mechanism via a Scene Coordinate Renderer and a view-consistency loss further enforces multi-view alignment, while ISM-based optimization accelerates training relative to SDS-based approaches. Empirical results demonstrate improved visual quality and cross-view consistency over strong baselines like DreamGaussian, GaussianDreamer, and LucidDreamer, with appreciable efficiency (about 30 minutes per object on a single GPU). Overall, DreamPolisher narrows the quality gap between text-to-3D and text-to-image-to-3D methods while maintaining practical training efficiency.
Abstract
We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions.
