Table of Contents
Fetching ...

GeoWorld: Geometric World Models

Zeyu Zhang, Danning Li, Ian Reid, Richard Hartley

TL;DR

GeoWorld is introduced, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds, enabling stable multi-step planning in hyperbolic latent space.

Abstract

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.

GeoWorld: Geometric World Models

TL;DR

GeoWorld is introduced, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds, enabling stable multi-step planning in hyperbolic latent space.

Abstract

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.
Paper Structure (73 sections, 80 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 73 sections, 80 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Gromov $\delta$-hyperbolicity on CrossTask zhukov2019cross.
  • Figure 2: Energy landscape comparison for V-JEPA 2 assran2025v and GeoWorld. We visualize the energy by sweeping two orthonormal tangent-space directions $(\Delta x, \Delta y)$ around a reference latent state. GeoWorlds yields a structured, curvature-aware energy landscape that better reflects geometric structure and hierarchical relations among latent states and improves energy-based planning. For more details see Appendix\ref{['sec:energy_landscape']}.
  • Figure 2: Geometric effects and curvature dynamics: (a) Poincaré disk geodesics connecting $x$ and $y$ under different curvatures $K$. As the curvature $K$ becomes less negative (i.e., closer to $0$), the hyperbolic distance between $x$ and $y$ increases, and the geodesic paths bend less and shift closer toward the origin. (b) Geodesic patterns induced by different boundary anchor points. Varying the anchor location produces a characteristic geodesic fan in the Poincaré disk. (c) As the curvature becomes less negative, the space flattens and the distance between $x$ and $y$ decreases. (d) Learnable curvature $c$ during supervised training, showing a gradual decrease from its initialization and convergence to a stable value 0.3.
  • Figure 3: Overview of GeoWorld. Our geometric world model integrates Hyperbolic JEPA for geometry-preserving latent dynamics and Geometric Reinforcement Learning for geodesic-consistent multi-step refinement. Together with energy-based planning using CEM, GeoWorld enables stable and geometry-aware long-horizon visual planning.