Table of Contents
Fetching ...

Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning

Lan Thi Ha Nguyen, Kien Ton Manh, Anh Do Duc, Nam Pham Hai

TL;DR

This work tackles the goal-setting bottleneck in self-supervised, goal-conditioned reinforcement learning by integrating physics into the goal-imagination pipeline. It introduces Physics-Informed RIG (PI-RIG), which leverages an Enhanced $p^3$-VAE to disentangle latent space into physics variables $z_I$ and appearance variables $z_E$, with a non-trainable physics layer $f_E$ and an ODE solver $F$ to enforce physical consistency. Goals are sampled from a constrained latent distribution $p(z|z \in Z_{feasible})$ using physics validation and reachability estimates, improving the realism and achievability of imagined goals. Empirical results on visual robotic manipulation tasks (reaching, pushing, pick-and-place) show that PI-RIG yields faster learning, more stable convergence, and substantially lower final distances to goals than RIG, CC-RIG, and Skew-Fit, demonstrating the practical impact of physics-grounded goal imagination on exploration and skill acquisition.

Abstract

Self-supervised goal-conditioned reinforcement learning enables robots to autonomously acquire diverse skills without human supervision. However, a central challenge is the goal setting problem: robots must propose feasible and diverse goals that are achievable in their current environment. Existing methods like RIG (Visual Reinforcement Learning with Imagined Goals) use variational autoencoder (VAE) to generate goals in a learned latent space but have the limitation of producing physically implausible goals that hinder learning efficiency. We propose Physics-Informed RIG (PI-RIG), which integrates physical constraints directly into the VAE training process through a novel Enhanced Physics-Informed Variational Autoencoder (Enhanced p3-VAE), enabling the generation of physically consistent and achievable goals. Our key innovation is the explicit separation of the latent space into physics variables governing object dynamics and environmental factors capturing visual appearance, while enforcing physical consistency through differential equation constraints and conservation laws. This enables the generation of physically consistent and achievable goals that respect fundamental physical principles such as object permanence, collision constraints, and dynamic feasibility. Through extensive experiments, we demonstrate that this physics-informed goal generation significantly improves the quality of proposed goals, leading to more effective exploration and better skill acquisition in visual robotic manipulation tasks including reaching, pushing, and pick-and-place scenarios.

Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning

TL;DR

This work tackles the goal-setting bottleneck in self-supervised, goal-conditioned reinforcement learning by integrating physics into the goal-imagination pipeline. It introduces Physics-Informed RIG (PI-RIG), which leverages an Enhanced -VAE to disentangle latent space into physics variables and appearance variables , with a non-trainable physics layer and an ODE solver to enforce physical consistency. Goals are sampled from a constrained latent distribution using physics validation and reachability estimates, improving the realism and achievability of imagined goals. Empirical results on visual robotic manipulation tasks (reaching, pushing, pick-and-place) show that PI-RIG yields faster learning, more stable convergence, and substantially lower final distances to goals than RIG, CC-RIG, and Skew-Fit, demonstrating the practical impact of physics-grounded goal imagination on exploration and skill acquisition.

Abstract

Self-supervised goal-conditioned reinforcement learning enables robots to autonomously acquire diverse skills without human supervision. However, a central challenge is the goal setting problem: robots must propose feasible and diverse goals that are achievable in their current environment. Existing methods like RIG (Visual Reinforcement Learning with Imagined Goals) use variational autoencoder (VAE) to generate goals in a learned latent space but have the limitation of producing physically implausible goals that hinder learning efficiency. We propose Physics-Informed RIG (PI-RIG), which integrates physical constraints directly into the VAE training process through a novel Enhanced Physics-Informed Variational Autoencoder (Enhanced p3-VAE), enabling the generation of physically consistent and achievable goals. Our key innovation is the explicit separation of the latent space into physics variables governing object dynamics and environmental factors capturing visual appearance, while enforcing physical consistency through differential equation constraints and conservation laws. This enables the generation of physically consistent and achievable goals that respect fundamental physical principles such as object permanence, collision constraints, and dynamic feasibility. Through extensive experiments, we demonstrate that this physics-informed goal generation significantly improves the quality of proposed goals, leading to more effective exploration and better skill acquisition in visual robotic manipulation tasks including reaching, pushing, and pick-and-place scenarios.

Paper Structure

This paper contains 21 sections, 10 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: System overview of our Physics-Informed RIG approach with Enhanced $p^3$-VAE architecture. The pipeline consists of four main stages: (1) Random interaction data collection from the environment, (2) Enhanced $p^3$-VAE training that separates latent space into physics variables $z_I$ and environmental variables $z_E$, with an ODE solver $F$ in the decoder enforcing physical consistency, (3) RL training using physics-informed goal generation, and (4) Test-time execution where the agent uses the learned policy to reach physically consistent goals.
  • Figure 2: Final Distance to Goal during training for the Visual Reacher task. PI-RIG achieves a final distance of approximately 0.1, representing a 54.5% improvement over RIG (0.22) and a 63.0% improvement over CC-RIG (0.27). Our approach also outperforms Skew-Fit by 52.4%, demonstrating consistent superiority across different baseline methods.
  • Figure 3: Final Distance to Goal during training for the Visual Pusher task. PI-RIG achieves the best performance among learning-based methods with a final distance of approximately 0.04, showing a 63.6% improvement over RIG (0.11) and a 71.4% improvement over CC-RIG (0.14). The method also outperforms Skew-Fit by 60.0%
  • Figure 4: Final Distance to Goal during training for the Visual Pick-and-Place task. In this complex task, PI-RIG achieves a final distance of approximately 0.07, representing a 46.1% improvement over RIG (0.13), a 74.0% improvement over CC-RIG (0.27), and a 72.0% improvement over Skew-Fit (0.25).