Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning
Lan Thi Ha Nguyen, Kien Ton Manh, Anh Do Duc, Nam Pham Hai
TL;DR
This work tackles the goal-setting bottleneck in self-supervised, goal-conditioned reinforcement learning by integrating physics into the goal-imagination pipeline. It introduces Physics-Informed RIG (PI-RIG), which leverages an Enhanced $p^3$-VAE to disentangle latent space into physics variables $z_I$ and appearance variables $z_E$, with a non-trainable physics layer $f_E$ and an ODE solver $F$ to enforce physical consistency. Goals are sampled from a constrained latent distribution $p(z|z \in Z_{feasible})$ using physics validation and reachability estimates, improving the realism and achievability of imagined goals. Empirical results on visual robotic manipulation tasks (reaching, pushing, pick-and-place) show that PI-RIG yields faster learning, more stable convergence, and substantially lower final distances to goals than RIG, CC-RIG, and Skew-Fit, demonstrating the practical impact of physics-grounded goal imagination on exploration and skill acquisition.
Abstract
Self-supervised goal-conditioned reinforcement learning enables robots to autonomously acquire diverse skills without human supervision. However, a central challenge is the goal setting problem: robots must propose feasible and diverse goals that are achievable in their current environment. Existing methods like RIG (Visual Reinforcement Learning with Imagined Goals) use variational autoencoder (VAE) to generate goals in a learned latent space but have the limitation of producing physically implausible goals that hinder learning efficiency. We propose Physics-Informed RIG (PI-RIG), which integrates physical constraints directly into the VAE training process through a novel Enhanced Physics-Informed Variational Autoencoder (Enhanced p3-VAE), enabling the generation of physically consistent and achievable goals. Our key innovation is the explicit separation of the latent space into physics variables governing object dynamics and environmental factors capturing visual appearance, while enforcing physical consistency through differential equation constraints and conservation laws. This enables the generation of physically consistent and achievable goals that respect fundamental physical principles such as object permanence, collision constraints, and dynamic feasibility. Through extensive experiments, we demonstrate that this physics-informed goal generation significantly improves the quality of proposed goals, leading to more effective exploration and better skill acquisition in visual robotic manipulation tasks including reaching, pushing, and pick-and-place scenarios.
