MOSIV: Multi-Object System Identification from Videos

Chunjiang Liu; Xiaoyuan Wang; Qingran Lin; Albert Xiao; Haoyu Chen; Shizheng Wen; Hao Zhang; Lu Qi; Ming-Hsuan Yang; Laszlo A. Jeni; Min Xu; Yizhou Zhao

MOSIV: Multi-Object System Identification from Videos

Chunjiang Liu, Xiaoyuan Wang, Qingran Lin, Albert Xiao, Haoyu Chen, Shizheng Wen, Hao Zhang, Lu Qi, Ming-Hsuan Yang, Laszlo A. Jeni, Min Xu, Yizhou Zhao

TL;DR

This work proposes MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video, and presents a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation.

Abstract

We introduce the challenging problem of multi-object system identification from videos, for which prior methods are ill-suited due to their focus on single-object scenes or discrete material classification with a fixed set of material prototypes. To address this, we propose MOSIV, a new framework that directly optimizes for continuous, per-object material parameters using a differentiable simulator guided by geometric objectives derived from video. We also present a new synthetic benchmark with contact-rich, multi-object interactions to facilitate evaluation. On this benchmark, MOSIV substantially improves grounding accuracy and long-horizon simulation fidelity over adapted baselines, establishing it as a strong baseline for this new task. Our analysis shows that object-level fine-grained supervision and geometry-aligned objectives are critical for stable optimization in these complex, multi-object settings. The source code and dataset will be released.

MOSIV: Multi-Object System Identification from Videos

TL;DR

Abstract

Paper Structure (38 sections, 22 equations, 18 figures, 9 tables)

This paper contains 38 sections, 22 equations, 18 figures, 9 tables.

Introduction
Related Work
Method
Problem statement
Overview
Preliminaries: Material Point Method
Preliminaries: Dynamic Gaussian reconstruction
Gaussian-to-continuum lifting in the multi-object regime
Multi-material parameterization and contact
Geometry-aligned objectives for multi-object identification
Optimization in the multi-object setting
Novel Interactions
Experiments
Experimental Setting
Quantitative Results Comparison
...and 23 more sections

Figures (18)

Figure 1: From multi-view observations of multi-object scenes (left), prior approaches select from a fixed library of expert constitutive models via categorical prediction, leading to visually implausible and weakly calibrated physics dynamics. MOSIV instead performs geometric reconstruction, per-object system identification of continuous constitutive parameters, enabling both faithful reproduction of observed interactions and accurate prediction of future behaviors (right).
Figure 2: (1) Geometric reconstruction. From multi-view RGB videos, we reconstruct object geometry and disentangle material-specific motion via optimizing 4D Gaussian Splatting (4DGS) with object masks. (2) Continuum simulation. The reconstructed Gaussians are lifted into object-specific continuums, which serve as the initial states for a differentiable MPM. Geometry-aligned losses on surfaces and silhouettes drive physics parameter optimization under inter-material contact and friction. (3) Applications. The calibrated model generalizes to novel interaction scenarios, enabling physically faithful rollouts and long-horizon predictions of complex multi-object dynamics.
Figure 3: Novel Interaction. Left—MOSIV original GT video sequence. Right—rollout after swapping object physics parameters while keeping initial conditions unchanged. Rows show time.
Figure 4: Qualitative comparison of multi-object interactions. The first four columns shows a plasticine–fluid (P–F) example; the last four columns shows a sand–sand (S–S) example.
Figure 5: Qualitative comparison between MOSIV and baselines.
...and 13 more figures

MOSIV: Multi-Object System Identification from Videos

TL;DR

Abstract

MOSIV: Multi-Object System Identification from Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (18)