Table of Contents
Fetching ...

Pygmalion Effect in Vision: Image-to-Clay Translation for Reflective Geometry Reconstruction

Gayoung Lee, Junho Kim, Jin-Hwa Kim, Junmo Kim

TL;DR

This work tackles the difficulty of recovering 3D geometry from scenes with strong view-dependent reflections. It introduces the Pygmalion Effect in Vision, a dual-branch framework that combines a BRDF-based reflective branch with a clay-guided branch that uses image-to-clay translations to produce reflection-free supervision for geometry learning. A key contribution is the clay-guided reflective Gaussian splatting, where a per-Gaussian clay color and a clay-rendering loss guide stable normal estimation and geometry while the BRDF branch handles appearance; training employs a staged schedule that emphasizes clay supervision early and RGB supervision later. Experiments on synthetic and real datasets show consistent gains in mesh quality and reconstruction stability, demonstrating that translating radiance into neutral clay renders can yield a strong inductive bias for reflective object geometry learning and practical improvements for downstream tasks like relighting and editing.

Abstract

Understanding reflection remains a long-standing challenge in 3D reconstruction due to the entanglement of appearance and geometry under view-dependent reflections. In this work, we present the Pygmalion Effect in Vision, a novel framework that metaphorically "sculpts" reflective objects into clay-like forms through image-to-clay translation. Inspired by the myth of Pygmalion, our method learns to suppress specular cues while preserving intrinsic geometric consistency, enabling robust reconstruction from multi-view images containing complex reflections. Specifically, we introduce a dual-branch network in which a BRDF-based reflective branch is complemented by a clay-guided branch that stabilizes geometry and refines surface normals. The two branches are trained jointly using the synthesized clay-like images, which provide a neutral, reflection-free supervision signal that complements the reflective views. Experiments on both synthetic and real datasets demonstrate substantial improvement in normal accuracy and mesh completeness over existing reflection-handling methods. Beyond technical gains, our framework reveals that seeing by unshining, translating radiance into neutrality, can serve as a powerful inductive bias for reflective object geometry learning.

Pygmalion Effect in Vision: Image-to-Clay Translation for Reflective Geometry Reconstruction

TL;DR

This work tackles the difficulty of recovering 3D geometry from scenes with strong view-dependent reflections. It introduces the Pygmalion Effect in Vision, a dual-branch framework that combines a BRDF-based reflective branch with a clay-guided branch that uses image-to-clay translations to produce reflection-free supervision for geometry learning. A key contribution is the clay-guided reflective Gaussian splatting, where a per-Gaussian clay color and a clay-rendering loss guide stable normal estimation and geometry while the BRDF branch handles appearance; training employs a staged schedule that emphasizes clay supervision early and RGB supervision later. Experiments on synthetic and real datasets show consistent gains in mesh quality and reconstruction stability, demonstrating that translating radiance into neutral clay renders can yield a strong inductive bias for reflective object geometry learning and practical improvements for downstream tasks like relighting and editing.

Abstract

Understanding reflection remains a long-standing challenge in 3D reconstruction due to the entanglement of appearance and geometry under view-dependent reflections. In this work, we present the Pygmalion Effect in Vision, a novel framework that metaphorically "sculpts" reflective objects into clay-like forms through image-to-clay translation. Inspired by the myth of Pygmalion, our method learns to suppress specular cues while preserving intrinsic geometric consistency, enabling robust reconstruction from multi-view images containing complex reflections. Specifically, we introduce a dual-branch network in which a BRDF-based reflective branch is complemented by a clay-guided branch that stabilizes geometry and refines surface normals. The two branches are trained jointly using the synthesized clay-like images, which provide a neutral, reflection-free supervision signal that complements the reflective views. Experiments on both synthetic and real datasets demonstrate substantial improvement in normal accuracy and mesh completeness over existing reflection-handling methods. Beyond technical gains, our framework reveals that seeing by unshining, translating radiance into neutrality, can serve as a powerful inductive bias for reflective object geometry learning.

Paper Structure

This paper contains 20 sections, 14 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Our method demonstrates that using an image-to-clay model to translate (a) training images into (b) clay images and employing them as geometric guidance in the initial phase leads to improved mesh geometry quality. (c) shows the result of our baseline, Reflective Gaussian Splatting, while (d) presents the result with an additional guidance loss using the clay images.
  • Figure 2: Overview of our dual-branch pipeline. A BRDF-based reflective branch and a clay-guided branch share the same Gaussian geometry. Clay-like images provide reflection-free supervision during early training, stabilizing geometry and improving surface normals.
  • Figure 3: Overall pipeline of the OminiControl model for image-to-clay translation. The red dashed boxes indicate trainable LoRA hu2022lora modules that are fine-tuned (\ref{['sec:i2c-finetuning']}).
  • Figure 4: Dataset creation using the Objaverse dataset. The input and GT denote the original reflective and clay-rendered images, respectively, while the output denotes the result from our image-to-clay translation model.
  • Figure 5: Dataset creation using FLUX and Nano-Banana. FLUX (input) images show the synthetic clay objects, Nano-Banana converts them into reflective versions, and the outputs show the reconstructed clay images produced by our model.
  • ...and 12 more figures