Table of Contents
Fetching ...

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

TL;DR

Repaint123 addresses SDS-induced flaws in one-image-to-3D generation by fusing a fast Gaussian Splatting coarse stage with a progressive, controllable 2D repainting process guided by visibility maps, mutual attention, and image prompts. This enables high-quality, multi-view-consistent texture refinement and rapid 3D optimization using a simple pixel-wise MSE loss, achieving around 2 minutes from scratch. The approach demonstrates strong quantitative and qualitative gains over baselines on standard datasets, offering real-time potential for single-view-to-3D content creation with improved texture fidelity and view consistency.

Abstract

Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. Our project page is available at https://pku-yuangroup.github.io/repaint123/.

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

TL;DR

Repaint123 addresses SDS-induced flaws in one-image-to-3D generation by fusing a fast Gaussian Splatting coarse stage with a progressive, controllable 2D repainting process guided by visibility maps, mutual attention, and image prompts. This enables high-quality, multi-view-consistent texture refinement and rapid 3D optimization using a simple pixel-wise MSE loss, achieving around 2 minutes from scratch. The approach demonstrates strong quantitative and qualitative gains over baselines on standard datasets, offering real-time potential for single-view-to-3D content creation with improved texture fidelity and view consistency.

Abstract

Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. Our project page is available at https://pku-yuangroup.github.io/repaint123/.
Paper Structure (25 sections, 6 equations, 11 figures, 6 tables)

This paper contains 25 sections, 6 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Repaint123 generates high-quality 3D content with detailed texture in only 2 minutes from a single image. Repaint123 adopts Gaussian Splatting in the coarse stage, and then utilize a 2D controllable diffusion model with repainting stategy to generate view-consistent high-quality images. This allows for fast and high-quality refinement of the extracted mesh texture through simple MSE loss.
  • Figure 2: Motivation of our proposed pipeline. Current methods adopt SDS loss, resulting in inconsistent and poor texture. Our idea is to combine the powerful image generation capability of the controllable 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view consistent images. The repainted images enable simple MSE loss for fast 3D content generation.
  • Figure 3: Controllable repainting scheme. Our scheme employs DDIM Inversion song2020ddim to generate deterministic noisy latent from coarse images, which are then refined via a diffusion model controlled by depth-guided geometry, reference image semantics, and attention-driven reference texture. We binarize the visibility map into an overlap mask by the timestep-aware binarization operation. Overlap regions are selectively repainted during each denoising step, leading to the high-quality refined novel-view image.
  • Figure 4: Image-to-3D generation pipeline. In the coarse stage, we adopt Gaussian Splatting representation optimized by SDS loss at the novel view. In the fine stage, we export Mesh representation and bidirectionally and progressively sample novel views for controllable progressive repainting. The novel-view refined images will compute MSE loss with the input novel-view image for efficient generation. Cameras in red are bidirectional neighbor cameras for obtaining the visibility map.
  • Figure 5: Relation between camera view and refinement strength. The areas in the red box are the same regions from different views.
  • ...and 6 more figures