FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection
Harry Zhang, Ben Eisner, David Held
TL;DR
FlowBot++ addresses generalization in articulated-object manipulation by jointly predicting dense per-point Articulation Flow $f_p$ and Articulation Projection $r_p$, enabling multi-step, smooth trajectories without per-step re-estimation. Grounded in a 3D perception module FlowProjNet (based on PointNet++), the method estimates the articulation axis $\boldsymbol{\omega}$ and origin $v$ to interpolate revolute and prismatic trajectories via Rodrigues' formula, with a Gram-Schmidt correction to align predictions. The authors demonstrate strong zero-shot generalization on PartNet-Mobility in simulation and show sim-to-real transfer on real objects using a Sawyer robot, outperforming FlowBot3D and other baselines. Limitations include failures when both predictions are incorrect and reliance on segmentation masks, suggesting avenues for reducing annotation requirements and improving joint-parameter estimation.
Abstract
Understanding and manipulating articulated objects, such as doors and drawers, is crucial for robots operating in human environments. We wish to develop a system that can learn to articulate novel objects with no prior interaction, after training on other articulated objects. Previous approaches for articulated object manipulation rely on either modular methods which are brittle or end-to-end methods, which lack generalizability. This paper presents FlowBot++, a deep 3D vision-based robotic system that predicts dense per-point motion and dense articulation parameters of articulated objects to assist in downstream manipulation tasks. FlowBot++ introduces a novel per-point representation of the articulated motion and articulation parameters that are combined to produce a more accurate estimate than either method on their own. Simulated experiments on the PartNet-Mobility dataset validate the performance of our system in articulating a wide range of objects, while real-world experiments on real objects' point clouds and a Sawyer robot demonstrate the generalizability and feasibility of our system in real-world scenarios.
