Table of Contents
Fetching ...

PPGuide: Steering Diffusion Policies with Performance Predictive Guidance

Zixing Wang, Devesh K. Jha, Ahmed H. Qureshi, Diego Romeres

TL;DR

Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time by using attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure.

Abstract

Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic manipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.

PPGuide: Steering Diffusion Policies with Performance Predictive Guidance

TL;DR

Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time by using attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure.

Abstract

Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic manipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.
Paper Structure (16 sections, 4 equations, 5 figures, 3 tables)

This paper contains 16 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of our PPGuide framework. First, a diverse dataset of trajectories is collected using policy checkpoints from different training stages. Then, (1) an attention-based MIL model analyzes these trajectories to automatically label observation-action chunks as Success-Relevant (SR), Failure-Relevant (FR), or Irrelevant (IR). (2) A lightweight classifier is trained on this labeled data to predict relevance from observation-action pairs. Finally, during inference, gradients from the classifier steer the diffusion sampling process away from failure modes while promoting success-relevant behaviors.
  • Figure 2: Evaluation tasks from the Robomimic and MimicGen benchmarks. (The Stack D1 task uses two cubes, while Stack Three D1 uses three).
  • Figure 3: This example shows how PPGuide steers the base policy to avoid misalignment during square insertion.
  • Figure 4: Averaged performance of PPGuide, with base policies of epoch 500 and 550, on the Square task across different guidance strength values. We are able to achieve performance gains over the base DP across a range of guidance strength.
  • Figure 5: Effect of Z-score selection. We build four datasets to train the classifier using different z-scores for Coffee D2 task. The results are the averaged results of experiments with guidance strength of 0.25, 0.3 and 0.35.