Table of Contents
Fetching ...

Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks

Pei Zhou, Wanting Yao, Qian Luo, Xunzhe Zhou, Yanchao Yang

TL;DR

Hyper-GoalNet tackles goal-conditioned robotic manipulation by generating task-specific policy parameters from goal images via hypernetworks, effectively separating goal interpretation from state processing. It introduces latent-space shaping with a forward dynamics model and a monotonic distance constraint to provide a structured representation that guides parameter generation. The approach outperforms fixed-parameter baselines on diverse Robosuite tasks, particularly under high environmental variability, and demonstrates robustness in real-world experiments with sensor noise. This work offers a scalable, cognitively inspired method for flexible visuomotor control with practical implications for robust robotics.

Abstract

Goal-conditioned policy learning for robotic manipulation presents significant challenges in maintaining performance across diverse objectives and environments. We introduce Hyper-GoalNet, a framework that generates task-specific policy network parameters from goal specifications using hypernetworks. Unlike conventional methods that simply condition fixed networks on goal-state pairs, our approach separates goal interpretation from state processing -- the former determines network parameters while the latter applies these parameters to current observations. To enhance representation quality for effective policy generation, we implement two complementary constraints on the latent space: (1) a forward dynamics model that promotes state transition predictability, and (2) a distance-based constraint ensuring monotonic progression toward goal states. We evaluate our method on a comprehensive suite of manipulation tasks with varying environmental randomization. Results demonstrate significant performance improvements over state-of-the-art methods, particularly in high-variability conditions. Real-world robotic experiments further validate our method's robustness to sensor noise and physical uncertainties. Code is available at: https://github.com/wantingyao/hyper-goalnet.

Hyper-GoalNet: Goal-Conditioned Manipulation Policy Learning with HyperNetworks

TL;DR

Hyper-GoalNet tackles goal-conditioned robotic manipulation by generating task-specific policy parameters from goal images via hypernetworks, effectively separating goal interpretation from state processing. It introduces latent-space shaping with a forward dynamics model and a monotonic distance constraint to provide a structured representation that guides parameter generation. The approach outperforms fixed-parameter baselines on diverse Robosuite tasks, particularly under high environmental variability, and demonstrates robustness in real-world experiments with sensor noise. This work offers a scalable, cognitively inspired method for flexible visuomotor control with practical implications for robust robotics.

Abstract

Goal-conditioned policy learning for robotic manipulation presents significant challenges in maintaining performance across diverse objectives and environments. We introduce Hyper-GoalNet, a framework that generates task-specific policy network parameters from goal specifications using hypernetworks. Unlike conventional methods that simply condition fixed networks on goal-state pairs, our approach separates goal interpretation from state processing -- the former determines network parameters while the latter applies these parameters to current observations. To enhance representation quality for effective policy generation, we implement two complementary constraints on the latent space: (1) a forward dynamics model that promotes state transition predictability, and (2) a distance-based constraint ensuring monotonic progression toward goal states. We evaluate our method on a comprehensive suite of manipulation tasks with varying environmental randomization. Results demonstrate significant performance improvements over state-of-the-art methods, particularly in high-variability conditions. Real-world robotic experiments further validate our method's robustness to sensor noise and physical uncertainties. Code is available at: https://github.com/wantingyao/hyper-goalnet.

Paper Structure

This paper contains 38 sections, 12 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: The proposed Goal-Conditioned Policy Generation framework (Hyper-GoalNet) and conventional goal-conditioned policies. Existing methods typically employ a fixed-parameter policy network that processes concatenated current observations and goal states, treating goals mostly as additional inputs. In contrast, our approach formulates policy learning as an adaptive generation task, where the goal image determines the parameters of the policy network itself -- transforming goals from inputs into specifications that define how current observations should be processed. This allows for more effective handling of diverse goals and complex manipulation tasks.
  • Figure 2: An overview of the proposed Hyper-GoalNet framework. (a) Adaptive Policy Generation: Unlike conventional approaches with fixed parameters, our hypernetwork dynamically generates task-specific policy parameters conditioned on goal images. This creates a parameter-adaptive target policy that processes current observations (RGB images and proprioception) through a multimodal encoder to produce actions tailored to specific goals. (b) Latent Shaping: Our approach enhances performance by explicitly structuring the latent space in two ways: a predictive network models state transitions to improve temporal dynamics, while geometric constraints ensure distances to goal states monotonically decrease during successful trajectories (detailed in Sec. \ref{['subsec:gr_policy']}).
  • Figure 3: Comparison of (a) unshaped R3M embeddings versus (b) our shaped latent space, showing $L_2$ distances to goal states along multiple trajectories. Our shaping creates consistent monotonic decreases in distance-to-goal, facilitating more effective parameter generation.
  • Figure 4: Key phases of diverse manipulation tasks in experimental evaluation.
  • Figure 5: Normalized distance between current and goal states computed with different latent spaces along policy rollouts.
  • ...and 5 more figures