Table of Contents
Fetching ...

Pseudo-Labeling and Contextual Curriculum Learning for Online Grasp Learning in Robotic Bin Picking

Huy Le, Philipp Schillinger, Miroslav Gabriel, Alexander Qualmann, Ngo Anh Vien

TL;DR

The paper tackles the challenge of online grasp learning in bin-picking under sparse reward feedback, where traditional offline methods fail to adapt to novel scenes. It introduces SSL-ConvSAC, a framework that fuses semi-supervised learning with a ConvSAC-based policy to leverage unlabeled pixel-level data alongside the single rewarded pixel. To cope with extreme labeled/unlabeled data imbalance, it proposes a contextual curriculum approach that adapts pseudo-label thresholds and weighting at the pixel level, drawing on FixMatch, FlexMatch, and FreeMatch ideas. Empirical results show faster learning and higher success in both large-scale offline evaluation and real-robot experiments, achieving approximately 90% grasp success and 93% bin completion in bin-picking tasks. The work demonstrates practical gains in online adaptation for robotic manipulation and highlights avenues for future work in closed-loop pseudo-labeling and flexible-object handling.

Abstract

The prevailing grasp prediction methods predominantly rely on offline learning, overlooking the dynamic grasp learning that occurs during real-time adaptation to novel picking scenarios. These scenarios may involve previously unseen objects, variations in camera perspectives, and bin configurations, among other factors. In this paper, we introduce a novel approach, SSL-ConvSAC, that combines semi-supervised learning and reinforcement learning for online grasp learning. By treating pixels with reward feedback as labeled data and others as unlabeled, it efficiently exploits unlabeled data to enhance learning. In addition, we address the imbalance between labeled and unlabeled data by proposing a contextual curriculum-based method. We ablate the proposed approach on real-world evaluation data and demonstrate promise for improving online grasp learning on bin picking tasks using a physical 7-DoF Franka Emika robot arm with a suction gripper. Video: https://youtu.be/OAro5pg8I9U

Pseudo-Labeling and Contextual Curriculum Learning for Online Grasp Learning in Robotic Bin Picking

TL;DR

The paper tackles the challenge of online grasp learning in bin-picking under sparse reward feedback, where traditional offline methods fail to adapt to novel scenes. It introduces SSL-ConvSAC, a framework that fuses semi-supervised learning with a ConvSAC-based policy to leverage unlabeled pixel-level data alongside the single rewarded pixel. To cope with extreme labeled/unlabeled data imbalance, it proposes a contextual curriculum approach that adapts pseudo-label thresholds and weighting at the pixel level, drawing on FixMatch, FlexMatch, and FreeMatch ideas. Empirical results show faster learning and higher success in both large-scale offline evaluation and real-robot experiments, achieving approximately 90% grasp success and 93% bin completion in bin-picking tasks. The work demonstrates practical gains in online adaptation for robotic manipulation and highlights avenues for future work in closed-loop pseudo-labeling and flexible-object handling.

Abstract

The prevailing grasp prediction methods predominantly rely on offline learning, overlooking the dynamic grasp learning that occurs during real-time adaptation to novel picking scenarios. These scenarios may involve previously unseen objects, variations in camera perspectives, and bin configurations, among other factors. In this paper, we introduce a novel approach, SSL-ConvSAC, that combines semi-supervised learning and reinforcement learning for online grasp learning. By treating pixels with reward feedback as labeled data and others as unlabeled, it efficiently exploits unlabeled data to enhance learning. In addition, we address the imbalance between labeled and unlabeled data by proposing a contextual curriculum-based method. We ablate the proposed approach on real-world evaluation data and demonstrate promise for improving online grasp learning on bin picking tasks using a physical 7-DoF Franka Emika robot arm with a suction gripper. Video: https://youtu.be/OAro5pg8I9U
Paper Structure (32 sections, 9 equations, 4 figures, 2 tables)

This paper contains 32 sections, 9 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: SSL-ConvSAC: A combined SSL and ConvSAC grasp learning approach to address the sparse reward feedback problem in online grasp learning. During online learning, only one pixel point gets feedback, hence the loss is sparsely backpropagated. In contrast, SSL-ConvSAC will learn from both ground-truth and pseudo-labeled reward feedback.
  • Figure 2: SSL-ConvSAC Pipeline: Both strongly and weakly augmented images are fed to a ConvSAC network. The pixel-wise prediction of a weakly augmented input is used to provide pseudo labels if their confidence are above a threshold. These pixel-wise pseudo-labeled rewards are then used to compute consistency regularization to update both the Critic and Actor receiving the strongly augmented image as input.
  • Figure 3: (Left) Online training plots on physical robot. Trailing grasp success rate over the latest 15 bins vs. number of online sample steps. The asynchronous training has a ratio of 10:1 steps per grasp attempt. (Center) Comparisons of contextural models. (Right) Comparisons of soft-weighting models.
  • Figure 4: Robot setup (left), online objects (middle), offline objects (right)