Table of Contents
Fetching ...

Detailed Geometry and Appearance from Opportunistic Motion

Ryosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent Lepetit, Ko Nishino

Abstract

Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundamentally constrained by the limited viewpoints. We show that this bound can be broken by exploiting opportunistic object motion: as a person manipulates an object~(e.g., moving a chair or lifting a mug), the static cameras effectively ``orbit'' the object in its local coordinate frame, providing additional virtual viewpoints. Harnessing this object motion, however, poses two challenges: the tight coupling of object pose and geometry estimation and the complex appearance variations of a moving object under static illumination. We address these by formulating a joint pose and shape optimization using 2D Gaussian splatting with alternating minimization of 6DoF trajectories and primitive parameters, and by introducing a novel appearance model that factorizes diffuse and specular components with reflected directional probing within the spherical harmonics space. Extensive experiments on synthetic and real-world datasets with extremely sparse viewpoints demonstrate that our method recovers significantly more accurate geometry and appearance than state-of-the-art baselines.

Detailed Geometry and Appearance from Opportunistic Motion

Abstract

Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundamentally constrained by the limited viewpoints. We show that this bound can be broken by exploiting opportunistic object motion: as a person manipulates an object~(e.g., moving a chair or lifting a mug), the static cameras effectively ``orbit'' the object in its local coordinate frame, providing additional virtual viewpoints. Harnessing this object motion, however, poses two challenges: the tight coupling of object pose and geometry estimation and the complex appearance variations of a moving object under static illumination. We address these by formulating a joint pose and shape optimization using 2D Gaussian splatting with alternating minimization of 6DoF trajectories and primitive parameters, and by introducing a novel appearance model that factorizes diffuse and specular components with reflected directional probing within the spherical harmonics space. Extensive experiments on synthetic and real-world datasets with extremely sparse viewpoints demonstrate that our method recovers significantly more accurate geometry and appearance than state-of-the-art baselines.

Paper Structure

This paper contains 36 sections, 14 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: We introduce a method to recover high-fidelity 3D geometry and appearance from sparse-view static videos. By leveraging "virtual viewpoints" induced by opportunistic object motion, our method jointly optimizes object pose and geometry through a motion-aware appearance representation to capture detailed surface structure.
  • Figure 2: Method Overview. Given sparse multi-view videos and an initial set of canonical 2D Gaussians (a), our framework recovers per-frame object poses while iteratively refining the Gaussian geometry via differentiable rendering. To ensure robust convergence under sparse supervision, we employ an alternating optimization that switches between 6-DoF object pose estimation (b) and canonical Gaussian refinement (c) using the aggregated temporal information from all processed frames.
  • Figure 3: Motion-Aware Appearance Modeling. (a) For moving objects, specular reflection is a function of incident radiance from the reflected viewing direction $\boldsymbol{\omega}_r$, which evolves with object pose. (b) Similarly, diffuse reflection depends on the time-varying surface normal $\mathbf{n}$ relative to the static environment. (c) Our model factorizes appearance into specular (e) and diffuse (f) components by evaluating the surface normal and reflected viewing directions in the world coordinate system. (d) Unlike standard 3DGS, which optimizes independent Spherical Harmonics (SH) for each primitive, our approach employs shared SH coefficients $\theta_d$ and $\theta_s$ across all foreground Gaussians to robustly capture the global illumination field.
  • Figure 4: Visualization of recovered surface normals. The error maps on the right visualize per-pixel estimation discrepancies. These results demonstrate that proper appearance modeling is essential for effectively leveraging radiometric cues to reconstruct fine surface details.
  • Figure 5: Novel view Synthesis. Our appearance model significantly improves the fidelity of novel view synthesis when applied to objects with specularity.
  • ...and 13 more figures