Table of Contents
Fetching ...

Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling

Kai Yang, Yuqi Huang, Junheng Tao, Wanyu Wang, Qitian Wu

TL;DR

PAINET addresses the challenge of modeling 3D dynamics in multi-body systems by learning unobserved all-pair interactions through an energy-based latent structure formulation. It derives a physics-informed, SE(3)-equivariant attention mechanism and couples it with a parallel, SE(3)-equivariant decoder to predict trajectories efficiently; the core update follows $\\mathbf h_i^{(t+1)} = (1-\\eta) \\mathbf h_i^{(t)} + \\eta \\sum_j rac{f_{ij}(\\|\\mathbf h_i - \\mathbf h_j\\|^2)}{\\sum_m f_{im}(\\|\\mathbf h_i - \\mathbf h_m\\|^2)} \\mathbf h_j^{(t)}$, with $E(\\mathbf H^{(t+1)},t+1;\\{\\rho_{ij}\\}) \le E(\\mathbf H^{(t)},t;\\{\\rho_{ij}\\})$ and $\\|\\mathbf h_i\\|_2=1$. The framework enables parallel decoding via EGNNs to output $\\widehat{\\mathbf X}^{(t)}$ for all $t$, while preserving SE(3) priors. Empirically, PAINET delivers up to 41.5% improvements in A-MSE on motion capture, MD17, and Adk protein dynamics with comparable compute, validating its effectiveness and scalability for large-scale multi-body dynamics.

Abstract

Modeling 3D dynamics is a fundamental problem in multi-body systems across scientific and engineering domains and has important practical implications in trajectory prediction and simulation. While recent GNN-based approaches have achieved strong performance by enforcing geometric symmetries, encoding high-order features or incorporating neural-ODE mechanics, they typically depend on explicitly observed structures and inherently fail to capture the unobserved interactions that are crucial to complex physical behaviors and dynamics mechanism. In this paper, we propose PAINET, a principled SE(3)-equivariant neural architecture for learning all-pair interactions in multi-body systems. The model comprises: (1) a novel physics-inspired attention network derived from the minimization trajectory of an energy function, and (2) a parallel decoder that preserves equivariance while enabling efficient inference. Empirical results on diverse real-world benchmarks, including human motion capture, molecular dynamics, and large-scale protein simulations, show that PAINET consistently outperforms recently proposed models, yielding 4.7% to 41.5% error reductions in 3D dynamics prediction with comparable computation costs in terms of time and memory.

Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling

TL;DR

PAINET addresses the challenge of modeling 3D dynamics in multi-body systems by learning unobserved all-pair interactions through an energy-based latent structure formulation. It derives a physics-informed, SE(3)-equivariant attention mechanism and couples it with a parallel, SE(3)-equivariant decoder to predict trajectories efficiently; the core update follows , with and . The framework enables parallel decoding via EGNNs to output for all , while preserving SE(3) priors. Empirically, PAINET delivers up to 41.5% improvements in A-MSE on motion capture, MD17, and Adk protein dynamics with comparable compute, validating its effectiveness and scalability for large-scale multi-body dynamics.

Abstract

Modeling 3D dynamics is a fundamental problem in multi-body systems across scientific and engineering domains and has important practical implications in trajectory prediction and simulation. While recent GNN-based approaches have achieved strong performance by enforcing geometric symmetries, encoding high-order features or incorporating neural-ODE mechanics, they typically depend on explicitly observed structures and inherently fail to capture the unobserved interactions that are crucial to complex physical behaviors and dynamics mechanism. In this paper, we propose PAINET, a principled SE(3)-equivariant neural architecture for learning all-pair interactions in multi-body systems. The model comprises: (1) a novel physics-inspired attention network derived from the minimization trajectory of an energy function, and (2) a parallel decoder that preserves equivariance while enabling efficient inference. Empirical results on diverse real-world benchmarks, including human motion capture, molecular dynamics, and large-scale protein simulations, show that PAINET consistently outperforms recently proposed models, yielding 4.7% to 41.5% error reductions in 3D dynamics prediction with comparable computation costs in terms of time and memory.

Paper Structure

This paper contains 25 sections, 1 theorem, 29 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

For any energy function defined by Eqn. eqn-energy with a given $\lambda>0$, there exists $0<\eta<1$ such that the iterative updating rule (from the initial state $\mathbf h_i^{(0)}$) yields a descent step on the energy, i.e., $E(\mathbf H^{(t+1)}, t+1; \{\rho_{ij}\}) \leq E(\mathbf H^{(t)}, t; \{\rho_{ij}\})$ for any $t\geq 1$.

Figures (11)

  • Figure 1: Illustration of PAINET framework. The model takes the initial state (including positions, velocities and observed features such as edge attributes of particles) as input and encode observed information into particle embeddings in latent space. The particle embeddings are updated through a stack of principled attention layers, where each layer corresponds to a descent step on the energy. The attention network includes adaptive pairwise mappings to capture long-range, particle-type-specific dependencies. For decoding, the model harnesses equivariant GNNs that incorporates the observed structural information without breaking $\hbox{SE(3)}$-equivariance and generates predicted trajectory of particles at multiple time steps in parallel.
  • Figure 2: Representative snapshots of aspirin molecular dynamics: the top row shows the ground-truth trajectories, the middle row shows the predictions from PAINET and the bottom row shows the predictions from GF-NODE, with corresponding F-MSEs reported across time steps (where in the parenthesis we show the actual time stamps). More results are presented in Appendix \ref{['appendix:visualization']}.
  • Figure 3: Illustration for three types of equivariance, including rotation, translation and permutation.
  • Figure 4: Ablation studies w.r.t. learnable pairwise mappings in the attention network and the parallel equivariant decoder on Motion Capture Run.
  • Figure 5: Ablation studies w.r.t. the number of decoding layers on Motion Capture.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 1