Table of Contents
Fetching ...

DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

Jinzhou Tang, Fan Feng, Minghao Fu, Wenjun Lin, Biwei Huang, Keze Wang

TL;DR

A Hamiltonian-based world model that learns from the collected data, using a novel self-supervised contrastive objective to identify the invariant physical state from raw, view-dependent pixel observations, significantly outperforms state-of-the-art baselines in 3D physics simulations on tasks requiring extrapolation.

Abstract

Learned world models excel at interpolative generalization but fail at extrapolative generalization to novel physical properties. This limitation arises because they learn statistical correlations rather than the environment's underlying generative rules, such as physical invariances and conservation laws. We argue that learning these invariances is key to robust extrapolation. To achieve this, we first introduce \textbf{Symmetry Exploration}, an unsupervised exploration strategy where an agent is intrinsically motivated by a Hamiltonian-based curiosity bonus to actively probe and challenge its understanding of conservation laws, thereby collecting physically informative data. Second, we design a Hamiltonian-based world model that learns from the collected data, using a novel self-supervised contrastive objective to identify the invariant physical state from raw, view-dependent pixel observations. Our framework, \textbf{DreamSAC}, trained on this actively curated data, significantly outperforms state-of-the-art baselines in 3D physics simulations on tasks requiring extrapolation.

DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

TL;DR

A Hamiltonian-based world model that learns from the collected data, using a novel self-supervised contrastive objective to identify the invariant physical state from raw, view-dependent pixel observations, significantly outperforms state-of-the-art baselines in 3D physics simulations on tasks requiring extrapolation.

Abstract

Learned world models excel at interpolative generalization but fail at extrapolative generalization to novel physical properties. This limitation arises because they learn statistical correlations rather than the environment's underlying generative rules, such as physical invariances and conservation laws. We argue that learning these invariances is key to robust extrapolation. To achieve this, we first introduce \textbf{Symmetry Exploration}, an unsupervised exploration strategy where an agent is intrinsically motivated by a Hamiltonian-based curiosity bonus to actively probe and challenge its understanding of conservation laws, thereby collecting physically informative data. Second, we design a Hamiltonian-based world model that learns from the collected data, using a novel self-supervised contrastive objective to identify the invariant physical state from raw, view-dependent pixel observations. Our framework, \textbf{DreamSAC}, trained on this actively curated data, significantly outperforms state-of-the-art baselines in 3D physics simulations on tasks requiring extrapolation.
Paper Structure (74 sections, 20 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 74 sections, 20 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of DreamSAC.(Right) Our world model maps observations $x_t$ to object-centric slots $Z_t$ via SAVi kipf2021conditional. We structure each slot $z_t^i$ into generalized coordinates ($q_t^i$) and canonical momenta ($p_t^i$). The dynamics are twofold: the stochastic state $Z_{t+1}$ is computed by integrating our $G$-invariant Hamiltonian $H_{\phi}$, while the deterministic state $h_{t+1}$ is updated by a GRU. (Left) Symmetry Exploration: To efficiently learn $H_{\phi}$, a policy $\pi_{\theta}$ is trained entirely in imagination to maximize our Symmetry-Aware Curiosity reward $r_{sym}$. This incentivizes the policy to work to break symmetry. The imagined policy is then executed in the real environment to collect challenging data, which refines the world model.
  • Figure 2: Qualitative analysis of DreamSAC's internal mechanisms.(a) t-SNE projections show our full model (with $\mathcal{L}_{\text{vr}}$) learns viewpoint-invariant representations, unlike an ablation without it. (b) The learned Hamiltonian $H_\phi$ (red dashed line) is conserved during a zero-action rollout, confirming the model learned a physical invariant (energy conservation). (c) Reward and Loss curves comparing DreamSAC with different baselines. (d) The latent states $(\boldsymbol q, \boldsymbol p)$ demonstrate physics-awareness: representations for pre-train (yellow) and fine-tune (blue) mix for familiar In-Distribution properties, but clearly separate to learn novel Out-of-Distribution properties.
  • Figure 3: Visual illustration of the linear annealing schedule for an environment with $T_{anneal} = 10^6$.