Table of Contents
Fetching ...

Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Eugene Teoh, Sumit Patidar, Xiao Ma, Stephen James

TL;DR

Green Screen Augmentation (GreenAug) tackles scene generalisation in vision-based robotic manipulation by collecting data with a green screen and replacing backgrounds via chroma keying. It introduces three variants—GreenAug-Rand, GreenAug-Gen, and GreenAug-Mask—and demonstrates substantial gains across eight tasks with 850 demonstrations and 8.2k evaluations, outperforming no augmentation, standard CV augmentations, and prior generative methods. The study advocates a shift toward green-screen demonstrations for future real-world policy learning to achieve robust transfer to visually distinct scenes. Limitations include masking imperfections and reduced handling of large object-geometry changes, with future work pointing to advanced chroma keying and extensions to 3D observation-based methods.

Abstract

Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple locations for each task. This paper proposes a novel approach where data is collected in a location predominantly featuring green screens. We introduce Green-screen Augmentation (GreenAug), employing a chroma key algorithm to overlay background textures onto a green screen. Through extensive real-world empirical studies with over 850 training demonstrations and 8.2k evaluation episodes, we demonstrate that GreenAug surpasses no augmentation, standard computer vision augmentation, and prior generative augmentation methods in performance. While no algorithmic novelties are claimed, our paper advocates for a fundamental shift in data collection practices. We propose that real-world demonstrations in future research should utilise green screens, followed by the application of GreenAug. We believe GreenAug unlocks policy generalisation to visually distinct novel locations, addressing the current scene generalisation limitations in robot learning.

Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

TL;DR

Green Screen Augmentation (GreenAug) tackles scene generalisation in vision-based robotic manipulation by collecting data with a green screen and replacing backgrounds via chroma keying. It introduces three variants—GreenAug-Rand, GreenAug-Gen, and GreenAug-Mask—and demonstrates substantial gains across eight tasks with 850 demonstrations and 8.2k evaluations, outperforming no augmentation, standard CV augmentations, and prior generative methods. The study advocates a shift toward green-screen demonstrations for future real-world policy learning to achieve robust transfer to visually distinct scenes. Limitations include masking imperfections and reduced handling of large object-geometry changes, with future work pointing to advanced chroma keying and extensions to 3D observation-based methods.

Abstract

Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple locations for each task. This paper proposes a novel approach where data is collected in a location predominantly featuring green screens. We introduce Green-screen Augmentation (GreenAug), employing a chroma key algorithm to overlay background textures onto a green screen. Through extensive real-world empirical studies with over 850 training demonstrations and 8.2k evaluation episodes, we demonstrate that GreenAug surpasses no augmentation, standard computer vision augmentation, and prior generative augmentation methods in performance. While no algorithmic novelties are claimed, our paper advocates for a fundamental shift in data collection practices. We propose that real-world demonstrations in future research should utilise green screens, followed by the application of GreenAug. We believe GreenAug unlocks policy generalisation to visually distinct novel locations, addressing the current scene generalisation limitations in robot learning.
Paper Structure (18 sections, 18 figures, 19 tables)

This paper contains 18 sections, 18 figures, 19 tables.

Figures (18)

  • Figure 1: GreenAug provides a simple visual augmentation to robot policies by first collecting data with a green screen, then augmenting it with different textures. The resulting policy can be transferred to unseen visually distinct novel locations (scenes).
  • Figure 2: The GreenAug process begins with acquiring a green screen mask using chroma keying. GreenAug-Rand applies random textures, GreenAug-Gen uses generative models to inpaint the background, and GreenAug-Mask learns a masking network to filter out the background.
  • Figure 3: Physical steps for green-screen setup. Scene items can either be moved into the green screen, or the green screen can be brought to the scene.
  • Figure 4: Visualisations of train and test scenes.
  • Figure 5: Visualisations of raw and preprocessed frames (left shoulder and lower wrist camera views) of generative augmentation, GreenAug-Gen and GreenAug-Mask (during inference). Both generative methods struggle with producing good contextual wrist camera inpainting. In generative augmentation, the gripper is inpainted as part of the background, while GreenAug-Mask shows masking artefacts in novel scenes.
  • ...and 13 more figures