Table of Contents
Fetching ...

Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data

Irum Mehboob, Li Sun, Alireza Astegarpanah, Rustam Stolkin

TL;DR

This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets, and uses a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality.

Abstract

This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality, so that the student can output confidence scores with each categorization. The resulting student significantly outperforms the same YOLO architecture trained directly on the same amount of labelled data. Our GP-based approach yields robust and meaningful uncertainty estimations for complex industrial object classifications. The end-to-end network is also capable of real-time processing, needed for robotics applications. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable. In this paper, we demonstrate an example of detection, localisation, and object category recognition of nuclear mixed-waste materials in highly cluttered and unstructured scenes. This is critical for robotic sorting and handling of legacy nuclear waste, which poses complex environmental remediation challenges in many nuclearised nations.

Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data

TL;DR

This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets, and uses a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality.

Abstract

This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality, so that the student can output confidence scores with each categorization. The resulting student significantly outperforms the same YOLO architecture trained directly on the same amount of labelled data. Our GP-based approach yields robust and meaningful uncertainty estimations for complex industrial object classifications. The end-to-end network is also capable of real-time processing, needed for robotics applications. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable. In this paper, we demonstrate an example of detection, localisation, and object category recognition of nuclear mixed-waste materials in highly cluttered and unstructured scenes. This is critical for robotic sorting and handling of legacy nuclear waste, which poses complex environmental remediation challenges in many nuclearised nations.

Paper Structure

This paper contains 21 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Detections on example image from our nuclear waste test dataset using the standard version of YOLOv3. Note how the conventional YOLOv3 can assign overly high confidence numbers to incorrect classifications.
  • Figure 2: The outline of the proposed method for rapidly boot-strapping a learning system, in a semi-supervised manner, requiring relatively sparse data. This is accomplished by combining Gaussian Processes and YOLOv3 in a Knowledge Distillation paradigm.
  • Figure 3: Deep kernel learning architecture with Stochastic variational inference procedure.
  • Figure 4: Schematic of the Knowledge distillation pipeline for categorization. a) The transfer of knowledge from the teacher backbone, as shown in Figure \ref{['fig:DCNN-GPC']}, to the student backbone utilizing the YOLOv3 architecture. (b) Illustration of the YOLOv3 output structure, where bounding box coordinates are generated by a 3D detector, defining the spatial location and size of each detected object within the 3D space. The objectness score indicates the confidence level that the bounding box contains an object. The final part of the output comprises probabilistic class scores, which provide a probabilistic distribution over possible classes, thereby incorporating uncertainty in the classification process.
  • Figure 5: Some examples from Nuclear waste test Dataset after training YOLOv3 through Knowledge distillation. As seen in Fig. \ref{['fig:testImgYolo']} conventional Yolov3 can assign overly high confidence to incorrect classification and does not detect small objects and objects of coarse scale. In contrast, the above Fig. images show the results of YOLOv3 after training through our "teacher-student" knowledge distillation paradigm. We can see that now YOLOv3 can detect most of the objects and classify them correctly, while also assigning sensible and meaningful confidence scores with each classification.
  • ...and 2 more figures