Table of Contents
Fetching ...

RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features

Howard H. Qian, Yangxiao Lu, Kejia Ren, Gaotian Wang, Ninad Khargonkar, Yu Xiang, Kaiyu Hang

TL;DR

It is demonstrated that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks.

Abstract

In order to successfully perform manipulation tasks in new environments, such as grasping, robots must be proficient in segmenting unseen objects from the background and/or other objects. Previous works perform unseen object instance segmentation (UOIS) by training deep neural networks on large-scale data to learn RGB/RGB-D feature embeddings, where cluttered environments often result in inaccurate segmentations. We build upon these methods and introduce a novel approach to correct inaccurate segmentation, such as under-segmentation, of static image-based UOIS masks by using robot interaction and a designed body frame-invariant feature. We demonstrate that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks. By introducing motion to regions of segmentation uncertainty, we are able to drastically improve segmentation accuracy in an uncertainty-driven manner with minimal, non-disruptive interactions (ca. 2-3 per scene). We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%, an increase of 28.2% when compared with other state-of-the-art UOIS methods.

RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant Features

TL;DR

It is demonstrated that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks.

Abstract

In order to successfully perform manipulation tasks in new environments, such as grasping, robots must be proficient in segmenting unseen objects from the background and/or other objects. Previous works perform unseen object instance segmentation (UOIS) by training deep neural networks on large-scale data to learn RGB/RGB-D feature embeddings, where cluttered environments often result in inaccurate segmentations. We build upon these methods and introduce a novel approach to correct inaccurate segmentation, such as under-segmentation, of static image-based UOIS masks by using robot interaction and a designed body frame-invariant feature. We demonstrate that the relative linear and rotational velocities of frames randomly attached to rigid bodies due to robot interactions can be used to identify objects and accumulate corrected object-level segmentation masks. By introducing motion to regions of segmentation uncertainty, we are able to drastically improve segmentation accuracy in an uncertainty-driven manner with minimal, non-disruptive interactions (ca. 2-3 per scene). We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%, an increase of 28.2% when compared with other state-of-the-art UOIS methods.
Paper Structure (17 sections, 6 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 17 sections, 6 equations, 8 figures, 1 table, 3 algorithms.

Figures (8)

  • Figure 1: Interactively segmenting a cluttered scene with minimal, non-disruptive pushes. [Top left] Initial scene and identified robot actions. [Top right] The origins of sampled body frames with matched BFIFs due to scene interactions, where matched body frames share the same color. [Bottom left] Undersegmentation of scene's end configuration by static segmentation model. [Bottom right] Accurate segmentation of scene by RISeg after interactions have been completed.
  • Figure 2: A visual representation of BFIFs. Motions of different body frames attached to the same rigid body are transformed into the same space frame twist. Sampled body frames $\{a_1\}$ and $\{a_2\}$ lie on the shaded oval object and $\{b_1\}$ and $\{b_2\}$ lie on the shaded rectangle object. Space frame $\{s\}$ is arbitrarily chosen. Body frames are shown on the initial (solid line) configurations of the rigid bodies and corresponding motions onto the displaced (dashed line) rigid body configurations are represented by linear velocity vectors $\upsilon_{\{x\}}$ (red). The closeup circle shows $\upsilon_{\{a_1\}} \neq \upsilon_{\{a_2\}}$. Transparent oval shapes show the shaded oval object imagined to be infinitely large. Linear velocities of each body frame $\upsilon_{\{x\}}$ are transformed to the space frame and are shown by spatial velocity vectors (purple). Corresponding body frames for each spatial velocity vector are denoted in the superscript of $\upsilon_{\{s\}}$.
  • Figure 3: Visualization of FindAction($\cdot$). [Top] "Certain" clusters shown in red and dark green. "Uncertain" clusters shown in purple and light green. [Bottom] "Certain" cluster centers ($C^c_m$) are shown in yellow. White, dashed line segments connect "certain" cluster centers ($\overline{C^c_{i}C^c_{j}}$). "Uncertain" cluster centers ($C^u_n$) are shown in red. Action $a_t$, defined by chosen push point $P^*$ and direction $\overrightarrow{P^{*}C^c_{i^*}}$, is shown in blue. "Uncertain" cluster center $C^u_*$ is used to choose $C^c_{i^*}$ and $C^c_{j^*}$ due to having minimum distance to $\overline{C^c_{i^*}C^c_{j^*}}$.
  • Figure 4: RISeg and MSMFormer segmentations of a cluttered tabletop scene throughout the interactive perception pipeline. The scene's initial state is shown after label "0". Scene configurations and segmentation masks after push numbers 1, 2, and 3 follow the corresponding arrows. Pushes are minimal and are always less than 2cm.
  • Figure 5: Percentage of objects correctly segmented as measured by the Overlap F-measure $\geq 75\%$.
  • ...and 3 more figures