Table of Contents
Fetching ...

Robotic Sim-to-Real Transfer for Long-Horizon Pick-and-Place Tasks in the Robotic Sim2Real Competition

Ming Yang, Hongyu Cao, Lixuan Zhao, Chenrui Zhang, Yaran Chen

TL;DR

The paper tackles the sim-to-real gap in long-horizon robotic pick-and-place by decoupling perception and actuation discrepancies into two robust modules: SMMS, a motion-blur resilient perception pipeline that fuses ArUco detection with a lightweight CNN classifier and data handling strategies, and DF-based feedback linearization for the omnidirectional chassis that mitigates nonlinearities and improper grasp poses. Across both simulation and real-world tests, the system achieves sub-centimeter servo accuracy and high perception reliability, culminating in first place at the 2024 Robotic Sim2Real Challenge in the mineral-searching task. Key contributions include the Sequential Motion-Blur Mitigation Strategy, the nonlinearity-robust Design Function, a modular system architecture, and comprehensive full-system evaluation showing 100% real-world grasp/stack success with robust sim-to-real consistency. The results underscore the practical feasibility of achieving consistent long-horizon robotic performance without altering underlying algorithms, with implications for scalable sim-to-real deployment in complex autonomous manipulation tasks.

Abstract

This paper presents a fully autonomous robotic system that performs sim-to-real transfer in complex long-horizon tasks involving navigation, recognition, grasping, and stacking in an environment with multiple obstacles. The key feature of the system is the ability to overcome typical sensing and actuation discrepancies during sim-to-real transfer and to achieve consistent performance without any algorithmic modifications. To accomplish this, a lightweight noise-resistant visual perception system and a nonlinearity-robust servo system are adopted. We conduct a series of tests in both simulated and real-world environments. The visual perception system achieves the speed of 11 ms per frame due to its lightweight nature, and the servo system achieves sub-centimeter accuracy with the proposed controller. Both exhibit high consistency during sim-to-real transfer. Benefiting from these, our robotic system took first place in the mineral searching task of the Robotic Sim2Real Challenge hosted at ICRA 2024.

Robotic Sim-to-Real Transfer for Long-Horizon Pick-and-Place Tasks in the Robotic Sim2Real Competition

TL;DR

The paper tackles the sim-to-real gap in long-horizon robotic pick-and-place by decoupling perception and actuation discrepancies into two robust modules: SMMS, a motion-blur resilient perception pipeline that fuses ArUco detection with a lightweight CNN classifier and data handling strategies, and DF-based feedback linearization for the omnidirectional chassis that mitigates nonlinearities and improper grasp poses. Across both simulation and real-world tests, the system achieves sub-centimeter servo accuracy and high perception reliability, culminating in first place at the 2024 Robotic Sim2Real Challenge in the mineral-searching task. Key contributions include the Sequential Motion-Blur Mitigation Strategy, the nonlinearity-robust Design Function, a modular system architecture, and comprehensive full-system evaluation showing 100% real-world grasp/stack success with robust sim-to-real consistency. The results underscore the practical feasibility of achieving consistent long-horizon robotic performance without altering underlying algorithms, with implications for scalable sim-to-real deployment in complex autonomous manipulation tasks.

Abstract

This paper presents a fully autonomous robotic system that performs sim-to-real transfer in complex long-horizon tasks involving navigation, recognition, grasping, and stacking in an environment with multiple obstacles. The key feature of the system is the ability to overcome typical sensing and actuation discrepancies during sim-to-real transfer and to achieve consistent performance without any algorithmic modifications. To accomplish this, a lightweight noise-resistant visual perception system and a nonlinearity-robust servo system are adopted. We conduct a series of tests in both simulated and real-world environments. The visual perception system achieves the speed of 11 ms per frame due to its lightweight nature, and the servo system achieves sub-centimeter accuracy with the proposed controller. Both exhibit high consistency during sim-to-real transfer. Benefiting from these, our robotic system took first place in the mineral searching task of the Robotic Sim2Real Challenge hosted at ICRA 2024.

Paper Structure

This paper contains 12 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The robot begins at a predefined starting position (see ① in (a)) and proceeds to search for and grasp "minerals"—cubes marked with unique digits—randomly scattered across three distinct zones (②). It then stacks the minerals as high as possible on a platform beneath the exchange station, which is shaped like a door frame with three markers on top indicating the stack order (③). The task concludes with the robot stopping at a designated parking point (④). Successful picking and placing rely on high-precision visual perception (b) and accurate servo control (c).
  • Figure 2: Images of markers captured in the simulator (a) and real-world (b) while robot rotating. Motion blur is difficult to simulate and poses challenges to robotic perception.
  • Figure 3: The CNN classifier comprises three convolution-dropout-pooling modules, followed by three fully connected layers. Its lightweight architecture (62.1k parameters in total) enables real-time inference.
  • Figure 4: Pipeline of our visual perception system with SMMS. We first use ArUco for extracting contours of markers. Then we rectify each contour with perspective transformation and classify it with CNN classifier, followed by a Perspective-n-Points (PnP) solver to estimate its pose. Throughout the pipeline, SMMS first enhances the image contrast before feeding the feature map into the detector. It then augments training dataset and rejects unreliable classifications. Subsequently, it applies extrapolation and filtering for final pose estimates.
  • Figure 5: Incorrect grasp results caused by inappropriate grasp pose. When grasping the mineral diagonally in practice (a), the gripper tends to get stuck on the corner and fail. However, it "penetrates" into the mineral and succeeds in the simulator (b).
  • ...and 2 more figures