Table of Contents
Fetching ...

DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

TL;DR

This work presents a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem, offering a solution that is both computationally efficient and capable of handling real-world variability.

Abstract

Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP's superior performance and training efficiency over the current state-of-the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP's data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git.

DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

TL;DR

This work presents a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem, offering a solution that is both computationally efficient and capable of handling real-world variability.

Abstract

Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP's superior performance and training efficiency over the current state-of-the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP's data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git.
Paper Structure (16 sections, 10 equations, 8 figures, 2 tables)

This paper contains 16 sections, 10 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Visualization of the backward diffusion process in affordance prediction. Each row represents different samples. Each column corresponds to one diffusion step. The diffusion step $t$ starts from 99 and ends at 0. Each figure represents a visualization of a sample at that time step. Yellow indicates that the region is placeable, and purple indicates it is not. At beginning, the scene starts with a random segmentation. As the backward diffusion process progresses, the affordance prediction gradually converges to the 4 placeable regions.
  • Figure 2: The Diffusion Affordance Prediction Architecture.
  • Figure 3: Illustration of the correspondence and pose computation on a 2D toy example. The green points are the target object, and the blue points are the container.
  • Figure 4: The correspondence prediction architecture inspired by IMOP zhang2024oneshot
  • Figure 5: Samples from RPdiff benchmark. We show two sample scenes from the RPdiff benchmark: one is placing a book into the bookshelf and the other is stacking a can inside a cabinet.
  • ...and 3 more figures