Table of Contents
Fetching ...

FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

Yishu Li, Wen Hui Leng, Yiming Fang, Ben Eisner, David Held

TL;DR

Visual ambiguities and occlusions hinder manipulation of articulated objects. FlowBotHD introduces a history-aware diffusion model for multi-modal 3D Articulation Flow ($3DAF$) predictions, augmented by a history encoder and a pragmatic manipulation policy to disambiguate across time. The approach advances by modeling multiple opening directions, leveraging past observations, and switching grasp points under uncertainty. Evaluations on PartNet-Mobility and real-world setups show state-of-the-art performance, with pronounced gains in severely occluded or ambiguous scenarios and demonstrated practical robotic feasibility.

Abstract

We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.

FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

TL;DR

Visual ambiguities and occlusions hinder manipulation of articulated objects. FlowBotHD introduces a history-aware diffusion model for multi-modal 3D Articulation Flow () predictions, augmented by a history encoder and a pragmatic manipulation policy to disambiguate across time. The approach advances by modeling multiple opening directions, leveraging past observations, and switching grasp points under uncertainty. Evaluations on PartNet-Mobility and real-world setups show state-of-the-art performance, with pronounced gains in severely occluded or ambiguous scenarios and demonstrated practical robotic feasibility.

Abstract

We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.

Paper Structure

This paper contains 30 sections, 1 equation, 15 figures, 4 tables, 1 algorithm.

Figures (15)

  • Figure 1: Our method, FlowBotHD, consists of a history-aware diffuser that handles multi-modality and occlusions in articulated object manipulation. Simulation and real world experiments show our model's capability of opening general objects (including handling ambiguities and occlusions) as well as an ambiguous door.
  • Figure 2: FlowBotHD structure: a history observation and the history flow are input into a history encoder to obtain a global history latent. This history latent is injected into the current observation encoder through fully-connected projection layers and element-wise multiplication. The current observation encoder outputs a history-aware point-wise embedding. The history-aware embedding is then input to a DiT-based denoiser to predict the 3D articulation flow eisner2022flowbot3d, i.e. the predicted future motion of the object if it were to be opened a small amount.
  • Figure 3: We demonstrate our model's performance improvement over the baseline in a severely occluded case. We open the object to different angles, make predictions, and plot the RMSE metric ($\downarrow$) against the open ratio. The baseline (top) fails at the point of severe occlusion, whereas our method (bottom) continues to make stable predictions using history. We also include flow visualizations (right) to show the quality of the predictions in both cases.
  • Figure 4: Demonstration of FlowBotHD opening a custom made door in different configurations. The first two rows show examples in which the model's initial predictions aligned with the current door configuration. The last row shows an example in which the model's initial prediction fails and the model switches its grasp point, ultimately leading to a success.
  • Figure 5: Simulation Visualizations: The plot in the middle is a simulation trajectory plot with x-axis as step number, and left y axis as the open ratio. We visualize the history update signal with red polygons, step with red polygon means updating this step's prediction as the latest history. Yellow triangle represents the switch grasp point (SGP) signal, meaning that this step requires a new grasp point. The bar plot on the background corresponds to number of trials we take to generate a prediction that satisfies the consistency check trial. The axis for the bar plot is the right y-axis.
  • ...and 10 more figures