Table of Contents
Fetching ...

Structure from Collision

Takuhiro Kaneko

TL;DR

SfC tackles recovering invisible internal object structure from collision-induced appearance changes, addressing the ill-posedness of static 3D reconstruction. It introduces SfC-NeRF, a physics-informed, two-stage framework built on PAC-NeRF that enforces physical consistency, appearance preservation, keyframe cues, and volume-annealing to optimize the interior while keeping the exterior intact. Across 115 diverse objects, including cavities, locations, and materials, SfC-NeRF improves internal-structure estimation and demonstrates practical benefits for future prediction, outperforming baselines and ablations. The work highlights a new direction for neural 3D representations, leveraging dynamics and physics to reveal hidden geometry with potential applications in robotics and simulation.

Abstract

Recent advancements in neural 3D representations, such as neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS), have enabled the accurate estimation of 3D structures from multiview images. However, this capability is limited to estimating the visible external structure, and identifying the invisible internal structure hidden behind the surface is difficult. To overcome this limitation, we address a new task called Structure from Collision (SfC), which aims to estimate the structure (including the invisible internal structure) of an object from appearance changes during collision. To solve this problem, we propose a novel model called SfC-NeRF that optimizes the invisible internal structure of an object through a video sequence under physical, appearance (i.e., visible external structure)-preserving, and keyframe constraints. In particular, to avoid falling into undesirable local optima owing to its ill-posed nature, we propose volume annealing; that is, searching for global optima by repeatedly reducing and expanding the volume. Extensive experiments on 115 objects involving diverse structures (i.e., various cavity shapes, locations, and sizes) and material properties revealed the properties of SfC and demonstrated the effectiveness of the proposed SfC-NeRF.

Structure from Collision

TL;DR

SfC tackles recovering invisible internal object structure from collision-induced appearance changes, addressing the ill-posedness of static 3D reconstruction. It introduces SfC-NeRF, a physics-informed, two-stage framework built on PAC-NeRF that enforces physical consistency, appearance preservation, keyframe cues, and volume-annealing to optimize the interior while keeping the exterior intact. Across 115 diverse objects, including cavities, locations, and materials, SfC-NeRF improves internal-structure estimation and demonstrates practical benefits for future prediction, outperforming baselines and ablations. The work highlights a new direction for neural 3D representations, leveraging dynamics and physics to reveal hidden geometry with potential applications in robotics and simulation.

Abstract

Recent advancements in neural 3D representations, such as neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS), have enabled the accurate estimation of 3D structures from multiview images. However, this capability is limited to estimating the visible external structure, and identifying the invisible internal structure hidden behind the surface is difficult. To overcome this limitation, we address a new task called Structure from Collision (SfC), which aims to estimate the structure (including the invisible internal structure) of an object from appearance changes during collision. To solve this problem, we propose a novel model called SfC-NeRF that optimizes the invisible internal structure of an object through a video sequence under physical, appearance (i.e., visible external structure)-preserving, and keyframe constraints. In particular, to avoid falling into undesirable local optima owing to its ill-posed nature, we propose volume annealing; that is, searching for global optima by repeatedly reducing and expanding the volume. Extensive experiments on 115 objects involving diverse structures (i.e., various cavity shapes, locations, and sizes) and material properties revealed the properties of SfC and demonstrated the effectiveness of the proposed SfC-NeRF.

Paper Structure

This paper contains 34 sections, 10 equations, 14 figures, 17 tables.

Figures (14)

  • Figure 1: Concept of Structure from Collision (SfC). (a) and (c) Examples of training images taken from a certain viewpoint. (b) and (d) Cross-sectional views of the internal structures cut perpendicular to the viewpoint. The score indicates the chamfer distance ($\times 10^3$$\downarrow$) between the ground-truth and estimated particles (the smaller, the better). Here, two objects appear to be identical in static images (1) but actually have different internal structures (3). (1) A static 3D representation learning model cannot distinguish the difference in internal structures (b)(d) because there is no difference in appearance in static images (a)(c). (2) To overcome this limitation, we address SfC. As shown in (a) and (c), changes in shape and appearance during collision are influenced by the internal structure. We utilize this property to identify the internal structure of the object. Although it is still difficult to identify perfectly owing to its ill-posed nature, the proposed method has succeeded in capturing the bias in the location of the holes (b)(d).
  • Figure 2: Optimization pipelines of SfC-NeRF. (i) The grid field $\mathcal{F}^{G'}(t_0)$ is initially optimized using the first frame of the video sequence. (ii) Subsequently, the structure (i.e., volume density $\sigma^{G'}(t_0) \in \mathcal{F}^{G'}(t_0)$) of the object is optimized through the entire video sequence with physical constraints ($\mathcal{L}_{\text{mass}}$ and DiffMPM), appearance-preserving constraints (i.e., $\mathcal{L}_{\text{pixel}_0}$ and $\mathcal{L}_{\text{depth}_0}$), and keyframe constraints ($\mathcal{L}_{\text{pixel}_k}$) along with a standard pixel loss ($\mathcal{L}_{\text{pixel}}$).
  • Figure 3: Examples of the data in the SfC dataset.
  • Figure 4: Comparison of learned structures for sphere objects with $s_c = (\frac{2}{3})^3$. The score under particles indicates the CD ($\times 10^3$$\downarrow$). (c)--(f) GO/LPO failed to determine optimal learning directions. (g)--(k) The ablated models failed to avoid improper solutions. (l) The full model overcomes these issues and achieves the best CD.
  • Figure 5: Comparison of appearances for objects with different internal structures when $t$ is varied within $\{ t_0, t_6, t_9 \}$.
  • ...and 9 more figures