Table of Contents
Fetching ...

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Amir Barda, Matheus Gadelha, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, Thibault Groueix

Abstract

We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without the need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multiview image inpainting problem, as this representation is generic and can be mapped back to any 3D representation using the bank of available Large Reconstruction Models. We explore different fine-tuning strategies to obtain both multiview generation and inpainting capabilities within the same diffusion model. In particular, the design of the inpainting mask is an important factor of training an inpainting model, and we propose several masking strategies to mimic the types of edits a user would perform on a 3D shape. Our approach takes 3D generative editing from hours to seconds and produces higher-quality results compared to previous works.

Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects

Abstract

We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without the need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multiview image inpainting problem, as this representation is generic and can be mapped back to any 3D representation using the bank of available Large Reconstruction Models. We explore different fine-tuning strategies to obtain both multiview generation and inpainting capabilities within the same diffusion model. In particular, the design of the inpainting mask is an important factor of training an inpainting model, and we propose several masking strategies to mimic the types of edits a user would perform on a 3D shape. Our approach takes 3D generative editing from hours to seconds and produces higher-quality results compared to previous works.

Paper Structure

This paper contains 11 sections, 4 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Our method takes as input a 3D object along with a 3D mask (first column) and a text prompt, and uses our multiview inpainting diffusion model to consistently paint the mask in four rendered views of the object. Off-the-shelf reconstructors can be used on the multiview output to give an NeRF, a Gaussian Splat (second column), or a mesh (third column) that can be used along with adaptive remeshing to ensure the unmasked region is exactly preserved e.g. topology, uvs, (fourth and fifth column). This feedforward approach is orders of magnitude faster than previous works in generative 3D editing, taking just $\approx3$ seconds per multiview edit, then 0.7 seconds to reconstruct a GS or a NeRF, 3 seconds for a mesh, and $\approx20$ seconds for optional mesh post-processing.
  • Figure 2: Overview. Given a NeRF, a Gaussian Splat, or a mesh, the user draws a 3D mask to mark a region to be filled and provides a text prompt to guide the generation. Instant3dit renders four canonical views of the masked object and uses our multiview inpainting network to fill the mask. We use off-the-shelf 3D reconstructors to convert the multiview representation into a NeRF, a Gaussian Splat, or a mesh.
  • Figure 3: Multiview representation. We represent 3D shapes multiview renderings. Editing is done using an image-based diffusion model that operates on $I_c(S, M)$ and $I_b(M,S)$.
  • Figure 4: Comparison to baselines. We show different inpainting results from different baselines; our multiview inpainting method offers the highest quality while maintaining consistency.
  • Figure 5: Failure Cases. Typical failure cases include failure to adhere to the prompt for thin masks or large masks, which have little inductive bias from the unmasked area.
  • ...and 8 more figures