Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
Kai Yang, Yuqi Huang, Junheng Tao, Wanyu Wang, Qitian Wu
TL;DR
PAINET addresses the challenge of modeling 3D dynamics in multi-body systems by learning unobserved all-pair interactions through an energy-based latent structure formulation. It derives a physics-informed, SE(3)-equivariant attention mechanism and couples it with a parallel, SE(3)-equivariant decoder to predict trajectories efficiently; the core update follows $\\mathbf h_i^{(t+1)} = (1-\\eta) \\mathbf h_i^{(t)} + \\eta \\sum_j rac{f_{ij}(\\|\\mathbf h_i - \\mathbf h_j\\|^2)}{\\sum_m f_{im}(\\|\\mathbf h_i - \\mathbf h_m\\|^2)} \\mathbf h_j^{(t)}$, with $E(\\mathbf H^{(t+1)},t+1;\\{\\rho_{ij}\\}) \le E(\\mathbf H^{(t)},t;\\{\\rho_{ij}\\})$ and $\\|\\mathbf h_i\\|_2=1$. The framework enables parallel decoding via EGNNs to output $\\widehat{\\mathbf X}^{(t)}$ for all $t$, while preserving SE(3) priors. Empirically, PAINET delivers up to 41.5% improvements in A-MSE on motion capture, MD17, and Adk protein dynamics with comparable compute, validating its effectiveness and scalability for large-scale multi-body dynamics.
Abstract
Modeling 3D dynamics is a fundamental problem in multi-body systems across scientific and engineering domains and has important practical implications in trajectory prediction and simulation. While recent GNN-based approaches have achieved strong performance by enforcing geometric symmetries, encoding high-order features or incorporating neural-ODE mechanics, they typically depend on explicitly observed structures and inherently fail to capture the unobserved interactions that are crucial to complex physical behaviors and dynamics mechanism. In this paper, we propose PAINET, a principled SE(3)-equivariant neural architecture for learning all-pair interactions in multi-body systems. The model comprises: (1) a novel physics-inspired attention network derived from the minimization trajectory of an energy function, and (2) a parallel decoder that preserves equivariance while enabling efficient inference. Empirical results on diverse real-world benchmarks, including human motion capture, molecular dynamics, and large-scale protein simulations, show that PAINET consistently outperforms recently proposed models, yielding 4.7% to 41.5% error reductions in 3D dynamics prediction with comparable computation costs in terms of time and memory.
