Table of Contents
Fetching ...

Image Sculpting: Precise Object Editing with 3D Geometry Control

Jiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie

TL;DR

Image Sculpting introduces a 3D-geometry–driven framework for precise object editing from a single image. It converts 2D content into a textured 3D model, allows interactive 3D deformation, and uses a coarse-to-fine diffusion-based enhancement to produce high-fidelity 2D outputs that preserve geometry and texture. The approach enables precise pose edits, rotations, translations, 3D composition, carving, and serial additions, validated on SculptingBench against strong baselines with quantitative metrics for texture and geometry. By integrating single-view reconstruction, graphics-style deformation, and diffusion-based refinement, the work advances the fusion of graphics pipelines with generative models for controllable, physically plausible image editing.

Abstract

We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. This approach differs markedly from existing methods, which are confined to 2D spaces and typically rely on textual instructions, leading to ambiguity and limited control. Image Sculpting converts 2D objects into 3D, enabling direct interaction with their 3D geometry. Post-editing, these objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.

Image Sculpting: Precise Object Editing with 3D Geometry Control

TL;DR

Image Sculpting introduces a 3D-geometry–driven framework for precise object editing from a single image. It converts 2D content into a textured 3D model, allows interactive 3D deformation, and uses a coarse-to-fine diffusion-based enhancement to produce high-fidelity 2D outputs that preserve geometry and texture. The approach enables precise pose edits, rotations, translations, 3D composition, carving, and serial additions, validated on SculptingBench against strong baselines with quantitative metrics for texture and geometry. By integrating single-view reconstruction, graphics-style deformation, and diffusion-based refinement, the work advances the fusion of graphics pipelines with generative models for controllable, physically plausible image editing.

Abstract

We present Image Sculpting, a new framework for editing 2D images by incorporating tools from 3D geometry and graphics. This approach differs markedly from existing methods, which are confined to 2D spaces and typically rely on textual instructions, leading to ambiguity and limited control. Image Sculpting converts 2D objects into 3D, enabling direct interaction with their 3D geometry. Post-editing, these objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.
Paper Structure (13 sections, 3 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 3 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: Achieving precise control in image editing tasks can be challenging with standard 2D generative pipelines. Our Image Sculpting framework offers the ability to interact with 3D geometry starting with a single image. This enables users to perform detailed, quantifiable, and physically-plausible edits, including precise pose editing, rotation, translation, 3D composition, carving, and serial addition.
  • Figure 2: Illustration of three mesh deformation methods applied to a 3D model. In cage-based space deformation (a), the model is placed in a cage and deformed when the user moves the cage vertices Ju:2005. As-Rigid-As-Possible (ARAP) ARAP_modeling:2007 deformation (b) deforms the model when user-selected blue handle points are moved towards designated red target points. Linear blend skinning (c) maps the deformation of a skeleton to the model skinningcourse:2014. Following deformation, a diffusion rendering process can be added for controllable generation. Each mesh deformation technique offers a different balance of control, speed, and precision. Our framework can use any of these techniques.
  • Figure 3: Overview of our Image Sculpting pipeline, DDIM$^\textbf{+}$ represents DDIM with the DreamBooth fine-tuned and depth controlled model. The process begins by converting the input image into a textured 3D model through a de-rendering process. This model is then prepared for interactive deformation by creating a skeleton and calculating skinning weights. The user can modify the skeleton to deform the model, resulting in an initial coarse image. To refine this edited image, we invert the coarse rendering $I_c$ into the noise $\boldsymbol{x}_T^c$. We then inject self-attention maps $\boldsymbol{A}_t^c$ and feature maps $\boldsymbol{f}_t^c$ from the initial image's denoising process into the enhanced image denoising steps. This technique helps in preserving the geometry of the modified object while restoring the visual quality of the edited image.
  • Figure 4: Comparison of our final method with various baseline methods and ablations. Our approach effectively maintains the geometric information while ensuring the texture quality. In contrast, other methods typically preserve either the texture or the geometry, but not both.
  • Figure 5: Overview of the coarse-to-fine generative enhancement model architecture. The red module denotes the one-shot DreamBooth dreambooth, which requires tuning; the grey module is the SDXL Refiner refiner, which is frozen in our experiments.
  • ...and 9 more figures