Table of Contents
Fetching ...

RFTrans: Leveraging Refractive Flow of Transparent Objects for Surface Normal Estimation and Manipulation

Tutian Tang, Jiyu Liu, Jieyi Zhang, Haoyuan Fu, Wenqiang Xu, Cewu Lu

TL;DR

RFTrans addresses the challenge of manipulating transparent objects by introducing refractive flow as a physically grounded intermediate representation to recover surface normals from RGB-D data. The method cascades RFNet (refractive flow, mask, boundaries) and F2Net (flow-to-normal) before a global depth optimization and an analytic grasp planner (ISF) to enable manipulation, trained on a large synthetic RFUniverse dataset. Empirical results show superior surface normal estimation and depth completion on both synthetic and real benchmarks, with direct sim-to-real transfer demonstrated by an 83% grasp-success rate in real-world experiments. The work highlights refractive flow as a robust bridge between synthetic training and real-world manipulation of thin-shell transparent objects, while noting limitations with extreme geometries and object overlap.

Abstract

Transparent objects are widely used in our daily lives, making it important to teach robots to interact with them. However, it's not easy because the reflective and refractive effects can make depth cameras fail to give accurate geometry measurements. To solve this problem, this paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects. By leveraging refractive flow as an intermediate representation, the proposed method circumvents the drawbacks of directly predicting the geometry (e.g. surface normal) from images and helps bridge the sim-to-real gap. It integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow. To make manipulation possible, a global optimization module will take in the predictions, refine the raw depth, and construct the point cloud with normal. An off-the-shelf analytical grasp planning algorithm is followed to generate the grasp poses. We build a synthetic dataset with physically plausible ray-tracing rendering techniques to train the networks. Results show that the proposed method trained on the synthetic dataset can consistently outperform the baseline method in both synthetic and real-world benchmarks by a large margin. Finally, a real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer. The code, data, and supplementary materials are available at https://rftrans.robotflow.ai.

RFTrans: Leveraging Refractive Flow of Transparent Objects for Surface Normal Estimation and Manipulation

TL;DR

RFTrans addresses the challenge of manipulating transparent objects by introducing refractive flow as a physically grounded intermediate representation to recover surface normals from RGB-D data. The method cascades RFNet (refractive flow, mask, boundaries) and F2Net (flow-to-normal) before a global depth optimization and an analytic grasp planner (ISF) to enable manipulation, trained on a large synthetic RFUniverse dataset. Empirical results show superior surface normal estimation and depth completion on both synthetic and real benchmarks, with direct sim-to-real transfer demonstrated by an 83% grasp-success rate in real-world experiments. The work highlights refractive flow as a robust bridge between synthetic training and real-world manipulation of thin-shell transparent objects, while noting limitations with extreme geometries and object overlap.

Abstract

Transparent objects are widely used in our daily lives, making it important to teach robots to interact with them. However, it's not easy because the reflective and refractive effects can make depth cameras fail to give accurate geometry measurements. To solve this problem, this paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects. By leveraging refractive flow as an intermediate representation, the proposed method circumvents the drawbacks of directly predicting the geometry (e.g. surface normal) from images and helps bridge the sim-to-real gap. It integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow. To make manipulation possible, a global optimization module will take in the predictions, refine the raw depth, and construct the point cloud with normal. An off-the-shelf analytical grasp planning algorithm is followed to generate the grasp poses. We build a synthetic dataset with physically plausible ray-tracing rendering techniques to train the networks. Results show that the proposed method trained on the synthetic dataset can consistently outperform the baseline method in both synthetic and real-world benchmarks by a large margin. Finally, a real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer. The code, data, and supplementary materials are available at https://rftrans.robotflow.ai.
Paper Structure (23 sections, 6 figures, 6 tables)

This paper contains 23 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Top: Transparency can cause inaccurate and missing depth captured by those widely-used RGB-D cameras. We utilize refractive flow to recover the surface normal and finally get the point cloud for robot manipulation. Bottom:(Left) A common wine glass. (Middle) We visualize the refractive flow by color map. The color represents the direction and magnitude. White indicates no refraction on the pixel. (Right) We sample some points on the image and show the corresponding refractive flow as arrows, which start from the foreground pixels on the glass to their corresponding pixels on the background.
  • Figure 2: Given an RGB-D image, RFNet first predicts the mask, the boundary, and the refractive flow of transparent objects. Next, F2Net will predict the surface normal based on the refractive flow. The global optimization will generate the singulated point cloud with normal. Finally, we apply the off-the-shelf manipulation algorithm, ISF, to generate grasp poses. The black points represent the fingers of the Franka Emika Panda robot.
  • Figure 3: Left: Point $O$ is the optical center of the pin-hole camera. Point $A$ is the point on the non-transparent background, e.g. a table. The refractive effect takes place at point $B$. $\overrightarrow{AB}$ is the incident ray and $\overrightarrow{BO}$ is the refracted ray. Point $D$ is the image of $A$ on the image plane. Without the transparent object, an imaginary ray will be directly from $A$ to $O$, intersecting the image plane at point $C$. The orthogonal distances between $C$ and $D$ on the image plane, $(\Delta x, \Delta y)$, is the refractive flow at point $D$. Due to transparency, when the Type II error happens, RGB-D cameras usually report $proj|\overrightarrow{OA}|$ as the depth value of point $D$, while the actual depth should be $proj|\overrightarrow{OB}|$, where $proj|\cdot|$ denotes the projected length on the principal axis. Right: The data acquisition system to capture refractive flow.
  • Figure 4: The refractive effect and the corresponding refractive flow in the real world and our simulation environment.
  • Figure 5: (a): The same transparent objects are placed in front of different, complex backgrounds. (b): The refractive flow predicted by RFNet. (c): The surface normal derived from the refractive flow. (d): The direct surface normal prediction from RGB images i.e., ClearGrasp cleargrasp. Please note that RFTrans only predicts the surface normal of transparent objects, while ClearGrasp also predicts the background's normal.
  • ...and 1 more figures