FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

Harry Zhang; Ben Eisner; David Held

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

Harry Zhang, Ben Eisner, David Held

TL;DR

FlowBot++ addresses generalization in articulated-object manipulation by jointly predicting dense per-point Articulation Flow $f_p$ and Articulation Projection $r_p$, enabling multi-step, smooth trajectories without per-step re-estimation. Grounded in a 3D perception module FlowProjNet (based on PointNet++), the method estimates the articulation axis $\boldsymbol{\omega}$ and origin $v$ to interpolate revolute and prismatic trajectories via Rodrigues' formula, with a Gram-Schmidt correction to align predictions. The authors demonstrate strong zero-shot generalization on PartNet-Mobility in simulation and show sim-to-real transfer on real objects using a Sawyer robot, outperforming FlowBot3D and other baselines. Limitations include failures when both predictions are incorrect and reliance on segmentation masks, suggesting avenues for reducing annotation requirements and improving joint-parameter estimation.

Abstract

Understanding and manipulating articulated objects, such as doors and drawers, is crucial for robots operating in human environments. We wish to develop a system that can learn to articulate novel objects with no prior interaction, after training on other articulated objects. Previous approaches for articulated object manipulation rely on either modular methods which are brittle or end-to-end methods, which lack generalizability. This paper presents FlowBot++, a deep 3D vision-based robotic system that predicts dense per-point motion and dense articulation parameters of articulated objects to assist in downstream manipulation tasks. FlowBot++ introduces a novel per-point representation of the articulated motion and articulation parameters that are combined to produce a more accurate estimate than either method on their own. Simulated experiments on the PartNet-Mobility dataset validate the performance of our system in articulating a wide range of objects, while real-world experiments on real objects' point clouds and a Sawyer robot demonstrate the generalizability and feasibility of our system in real-world scenarios.

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

TL;DR

FlowBot++ addresses generalization in articulated-object manipulation by jointly predicting dense per-point Articulation Flow

and Articulation Projection

, enabling multi-step, smooth trajectories without per-step re-estimation. Grounded in a 3D perception module FlowProjNet (based on PointNet++), the method estimates the articulation axis

and origin

to interpolate revolute and prismatic trajectories via Rodrigues' formula, with a Gram-Schmidt correction to align predictions. The authors demonstrate strong zero-shot generalization on PartNet-Mobility in simulation and show sim-to-real transfer on real objects using a Sawyer robot, outperforming FlowBot3D and other baselines. Limitations include failures when both predictions are incorrect and reliance on segmentation masks, suggesting avenues for reducing annotation requirements and improving joint-parameter estimation.

Abstract

Paper Structure (33 sections, 12 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 12 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Background
FlowBot++: From New Representations to Smooth Trajectories
A New Representation of Articulated Objects
Manipulation via Learned Articulation Flow and Articulation Projection
Jointly Learning Articulation Flow and Articulation Projection
Experiments
Simulation Results
Real-World Experiments
Conclusions and Limitations
Appendix
Full FlowBot++ Manipulation Policy
Ablations
Controller $H$ Values (Replanning Frequency)
...and 18 more sections

Figures (10)

Figure 1: FlowBot++ in Action. The system first observes a point cloud observation of an articulated object and estimates the object's Articulation Flow and Articulation Projection to infer the articulation axis. Then the inferred axis is used to output a smooth trajectory to actuate the object.
Figure 2: For each point $p$ on the object, Articulation Flow $f_p$eisner2022flowbot3d represents its instantaneous motion under a force in the opening direction; our new representation, Articulation Projection $r_p$, represents the displacement that projects $p$ to the articulation axis $\Vec{v}$. We train a network to predict both $f_p$ and $r_p$ and combine their predictions to get a smoother and more robust estimate. The purple points represent an interpolated prismatic trajectory of length $l_g$ and an interpolated revolute trajectory of $\phi_g$ angle rotation. This corresponds to the trajectory prediction described in Sec. \ref{['sec:Manipulation via Learned Articulation Flow and Articulation Projection']}.
Figure 3: FlowBot++ System Overview. Our system in deployment first takes as input a partial point cloud observation of an articulated object (a microwave shown here) and uses FlowProjNet to jointly estimate the object's Articulation Flow (top) and Articulation Projection (bottom, points displaced by AP shown). The estimates will then be used in the downstream manipulation pipeline that interpolates and follows the planned trajectory smoothly. Unlike FlowBot3D eisner2022flowbot3d, we do not repeat the estimation every step. Instead, we repeat this loop in a much lower frequency (once every H steps) to improve the smoothness of the planned trajectory.
Figure 4: Gram-Schmidt Correction & Performance of Different Methods. In (a), without Gram-Schmidt, the inferred articulation axis $\hat{\boldsymbol{\omega}}$ (green) is not accurate; blue points show the displacement of points by the Articulation Projection $r_p$. Using Gram-Schmidt, we use the Articulation Flow $f_p$ (red vectors) to correct the axis direction; $\hat{\boldsymbol{\omega}}$ is now perpendicular to the Articulation Flow and aligns better with the ground-truth axis direction. In (b), we show the bar plot of our method compared with several baseline methods on training and testing prismatic/revolute objects using normalized distance ($\downarrow$). Note the performance gain after correction via Gram-Schmidt (AP Only vs Combined). Some values are not visible because they are $<0.05$.
Figure 5: Comparison of Angular Acceleration and Opening Trajectory between FlowBot3D and FlowBot++. In (a), we show two 20-step trajectories' scatter plots of the contact point on a revolute object using FlowBot3D (left) and FlowBot++ (right), colored using the signed intensity of the angular acceleration. In (b), we plot 20 steps vs. the average angular acceleration and opening fraction across 300 trials involving 15 revolute objects. Both plots show that FlowBot++ is able to produce more consistent motions and open the objects further under the same number of steps.
...and 5 more figures

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

TL;DR

Abstract

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)