Table of Contents
Fetching ...

Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics

Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, Chao Ma

TL;DR

This work tackles visual grounding of intrinsic dynamics from videos by introducing NeuMA, a Neural Material Adaptor that learns a residual correction ${\Delta{\mathcal{M}}_{\theta}}$ to an expert prior ${\mathcal{M}}_0$ within a differentiable physics engine. The method couples a Low-Rank adaptation of constitutive models with an elastodynamic solver (MPM) and a differentiable 3D Gaussian Splatting renderer (Particle-GS) to minimize image-based losses ${\mathcal L}_v$, enabling end-to-end grounding of motion from observations. Empirical results on synthetic and real data show NeuMA improves object-dynamics grounding and dynamic rendering while generalizing to unseen shapes and multi-object interactions, demonstrating the value of combining physical priors with data-driven corrections. The approach offers a principled, interpretable path toward accurate, generalizable modeling of intrinsic dynamics for visual understanding and simulation.

Abstract

While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle. Current methods for visual grounding of dynamics either use pure neural-network-based simulators (black box), which may violate physical laws, or traditional physical simulators (white box), which rely on expert-defined equations that may not fully capture actual dynamics. We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections, facilitating accurate learning of actual dynamics while maintaining the generalizability and interpretability of physical priors. Additionally, we propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images, allowing back-propagate image gradients to optimize the simulator. Comprehensive experiments on various dynamics in terms of grounded particle accuracy, dynamic rendering quality, and generalization ability demonstrate that NeuMA can accurately capture intrinsic dynamics.

Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics

TL;DR

This work tackles visual grounding of intrinsic dynamics from videos by introducing NeuMA, a Neural Material Adaptor that learns a residual correction to an expert prior within a differentiable physics engine. The method couples a Low-Rank adaptation of constitutive models with an elastodynamic solver (MPM) and a differentiable 3D Gaussian Splatting renderer (Particle-GS) to minimize image-based losses , enabling end-to-end grounding of motion from observations. Empirical results on synthetic and real data show NeuMA improves object-dynamics grounding and dynamic rendering while generalizing to unseen shapes and multi-object interactions, demonstrating the value of combining physical priors with data-driven corrections. The approach offers a principled, interpretable path toward accurate, generalizable modeling of intrinsic dynamics for visual understanding and simulation.

Abstract

While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle. Current methods for visual grounding of dynamics either use pure neural-network-based simulators (black box), which may violate physical laws, or traditional physical simulators (white box), which rely on expert-defined equations that may not fully capture actual dynamics. We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections, facilitating accurate learning of actual dynamics while maintaining the generalizability and interpretability of physical priors. Additionally, we propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images, allowing back-propagate image gradients to optimize the simulator. Comprehensive experiments on various dynamics in terms of grounded particle accuracy, dynamic rendering quality, and generalization ability demonstrate that NeuMA can accurately capture intrinsic dynamics.

Paper Structure

This paper contains 34 sections, 15 equations, 15 figures, 4 tables, 2 algorithms.

Figures (15)

  • Figure 1: The core idea of NeuMA: Learning to correct existing expert knowledge on object motions by fitting a neural material adaptor to ground-truth visual observations.
  • Figure 2: The pipeline of NeuMA for visual grounding. During Stage I, we first reconstruct the 3D Gaussian kernels of the foreground object using masked multi-view images. Then, we uniformly sample the initial physical particles from the object volume and bind them to the reconstructed Gaussian kernels. In Stage II, we integrate the neural material adaptor into the PDE-based simulation framework to estimate the actual dynamics. In Stage III, we deform the Gaussian kernels according to the binding relationship (pre-computed in Stage I) and then render 2D images. The neural material adaptor is trained end-to-end using the difference between the rendered and observed images.
  • Figure 3: Comparison in object dynamics grounding over the entire simulation sequence.
  • Figure 4: Quantitative comparison in dynamic scene rendering.
  • Figure 5: The visual results for dynamic scene rendering.
  • ...and 10 more figures