Table of Contents
Fetching ...

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

Xiatao Sun, Yuxuan Wang, Shuo Yang, Yinxing Chen, Daniel Rakita

TL;DR

This paper introduces hPGA-DP, a novel hybrid diffusion policy that leverages the Projective Geometric Algebra Transformer as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process.

Abstract

Diffusion policies are a powerful paradigm for robot learning, but their training is often inefficient. A key reason is that networks must relearn fundamental spatial concepts, such as translations and rotations, from scratch for every new task. To alleviate this redundancy, we propose embedding geometric inductive biases directly into the network architecture using Projective Geometric Algebra (PGA). PGA provides a unified algebraic framework for representing geometric primitives and transformations, allowing neural networks to reason about spatial structure more effectively. In this paper, we introduce hPGA-DP, a novel hybrid diffusion policy that capitalizes on these benefits. Our architecture leverages the Projective Geometric Algebra Transformer (P-GATr) as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process. Through extensive experiments and ablation studies in both simulated and real-world environments, we demonstrate that hPGA-DP significantly improves task performance and training efficiency. Notably, our hybrid approach achieves substantially faster convergence compared to both standard diffusion policies and architectures that rely solely on P-GATr. The project website is available at: https://apollo-lab-yale.github.io/26-ICRA-hPGA-website/.

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

TL;DR

This paper introduces hPGA-DP, a novel hybrid diffusion policy that leverages the Projective Geometric Algebra Transformer as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process.

Abstract

Diffusion policies are a powerful paradigm for robot learning, but their training is often inefficient. A key reason is that networks must relearn fundamental spatial concepts, such as translations and rotations, from scratch for every new task. To alleviate this redundancy, we propose embedding geometric inductive biases directly into the network architecture using Projective Geometric Algebra (PGA). PGA provides a unified algebraic framework for representing geometric primitives and transformations, allowing neural networks to reason about spatial structure more effectively. In this paper, we introduce hPGA-DP, a novel hybrid diffusion policy that capitalizes on these benefits. Our architecture leverages the Projective Geometric Algebra Transformer (P-GATr) as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process. Through extensive experiments and ablation studies in both simulated and real-world environments, we demonstrate that hPGA-DP significantly improves task performance and training efficiency. Notably, our hybrid approach achieves substantially faster convergence compared to both standard diffusion policies and architectures that rely solely on P-GATr. The project website is available at: https://apollo-lab-yale.github.io/26-ICRA-hPGA-website/.

Paper Structure

This paper contains 19 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of the hPGA-DP network architecture.
  • Figure 2: Top: simulation tasks in robosuite, with colored 3D bounding boxes indicating task-relevant objects. Bottom left: success rates for diffusion policies with different network backbones for various tasks, and mean epoch training time (MET) for each network on all tasks together. Bottom right: plot of success rate for state-based policies with U-Net, Transformer, hPGA-U, and hPGA-T for 100 training epochs of the Stack task.
  • Figure 3: Top: Success rate of hPGA-DP under different decoder loss masking thresholds $\eta$, where solid line denotes the mean and shaded region indicates the standard deviation. Bottom: Performance of diffusion policies with various combinations of backbone (underlined) and encoder & decoder (italicized).
  • Figure 4: Top left: the dual-arm system for real-world experiments. Top right: top and bottom row show the block stacking task and drawer interaction task respectively. Bottom: results for real-world experiments. SR: success rate, CT: cumulative training time measured in minutes.