FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation
Yishu Li, Wen Hui Leng, Yiming Fang, Ben Eisner, David Held
TL;DR
Visual ambiguities and occlusions hinder manipulation of articulated objects. FlowBotHD introduces a history-aware diffusion model for multi-modal 3D Articulation Flow ($3DAF$) predictions, augmented by a history encoder and a pragmatic manipulation policy to disambiguate across time. The approach advances by modeling multiple opening directions, leveraging past observations, and switching grasp points under uncertainty. Evaluations on PartNet-Mobility and real-world setups show state-of-the-art performance, with pronounced gains in severely occluded or ambiguous scenarios and demonstrated practical robotic feasibility.
Abstract
We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.
