Table of Contents
Fetching ...

Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying

Thomas Pöllabauer, Volker Knauthe, André Boller, Arjan Kuijper, Dieter Fellner

TL;DR

The paper tackles the data bottleneck in object detection and segmentation for DNNs by eliminating manual labeling and 3D asset requirements. It introduces luminance keying with a highly absorbing black screen to capture short videos and generate automatic masks, which are then used to create background-varying training images via a cut-and-paste pipeline. The authors train YOLOX on COCO-formatted data derived from YCB-V objects and compare with green-screen chroma keying and rendering-based datasets, showing that LUMA yields competitive or superior performance, particularly on real test data. They release code and black-screen recordings to promote reproducibility and rapid adoption in small-scale applications.

Abstract

Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.

Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying

TL;DR

The paper tackles the data bottleneck in object detection and segmentation for DNNs by eliminating manual labeling and 3D asset requirements. It introduces luminance keying with a highly absorbing black screen to capture short videos and generate automatic masks, which are then used to create background-varying training images via a cut-and-paste pipeline. The authors train YOLOX on COCO-formatted data derived from YCB-V objects and compare with green-screen chroma keying and rendering-based datasets, showing that LUMA yields competitive or superior performance, particularly on real test data. They release code and black-screen recordings to promote reproducibility and rapid adoption in small-scale applications.

Abstract

Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.
Paper Structure (17 sections, 4 figures, 3 tables)

This paper contains 17 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: A qualitative sample of all 21 YCB-V objects, that were recorded with a handheld smartphone and our proposed black background. It can be seen that the objects are well silhouetted against the background and can therefore be segmented in an easy way and many typical chroma key-associated problems are circumvented.
  • Figure 2: Samples from a subset of our evaluated training datasets. REAL are real images, RBG are real images with replaced backgrounds, PBR are physical based renderings, PBG are physical based renderings with background replacement, PBR-rTex are PBRs with randomized textures and CHROMA, as well as LUMA (Ours) stand for different capturing methods.
  • Figure 3: Some of the problems with chroma keying. Color bleeding leads to part of the object appearing greenish, which leads to imperfect masking (top left). Luminance keying (top right) in contrast gives much improved masking. Other problems with chroma are the high reflectivity of conventional backgrounds, that lead to a "halo" effect at the edges (bottom left), and the cutting out of object parts close to the background color (bottom right).
  • Figure 4: Another problem of chroma keying. While the lighting did not change, a slight change in camera settings between two video clips lead to very different tones of green, making the thresholding much harder compared to luminance keying.