Table of Contents
Fetching ...

Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network

Yin Zhang, Jinhong Deng, Peidong Liu, Wen Li, Shiyu Zhao

TL;DR

A novel benchmark that consists of three domain adaptation tasks: simulation-to-real adaptation, cross-scene adaptation, and cross-camera adaptation, respectively is established and a noise suppression network is proposed to overcome the error accumulation.

Abstract

Visual detection of Micro Air Vehicles (MAVs) has attracted increasing attention in recent years due to its important application in various tasks. The existing methods for MAV detection assume that the training set and testing set have the same distribution. As a result, when deployed in new domains, the detectors would have a significant performance degradation due to domain discrepancy. In this paper, we study the problem of cross-domain MAV detection. The contributions of this paper are threefold. 1) We propose a Multi-MAV-Multi-Domain (M3D) dataset consisting of both simulation and realistic images. Compared to other existing datasets, the proposed one is more comprehensive in the sense that it covers rich scenes, diverse MAV types, and various viewing angles. A new benchmark for cross-domain MAV detection is proposed based on the proposed dataset. 2) We propose a Noise Suppression Network (NSN) based on the framework of pseudo-labeling and a large-to-small training procedure. To reduce the challenging pseudo-label noises, two novel modules are designed in this network. The first is a prior-based curriculum learning module for allocating adaptive thresholds for pseudo labels with different difficulties. The second is a masked copy-paste augmentation module for pasting truly-labeled MAVs on unlabeled target images and thus decreasing pseudo-label noises. 3) Extensive experimental results verify the superior performance of the proposed method compared to the state-of-the-art ones. In particular, it achieves mAP of 46.9%(+5.8%), 50.5%(+3.7%), and 61.5%(+11.3%) on the tasks of simulation-to-real adaptation, cross-scene adaptation, and cross-camera adaptation, respectively.

Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network

TL;DR

A novel benchmark that consists of three domain adaptation tasks: simulation-to-real adaptation, cross-scene adaptation, and cross-camera adaptation, respectively is established and a noise suppression network is proposed to overcome the error accumulation.

Abstract

Visual detection of Micro Air Vehicles (MAVs) has attracted increasing attention in recent years due to its important application in various tasks. The existing methods for MAV detection assume that the training set and testing set have the same distribution. As a result, when deployed in new domains, the detectors would have a significant performance degradation due to domain discrepancy. In this paper, we study the problem of cross-domain MAV detection. The contributions of this paper are threefold. 1) We propose a Multi-MAV-Multi-Domain (M3D) dataset consisting of both simulation and realistic images. Compared to other existing datasets, the proposed one is more comprehensive in the sense that it covers rich scenes, diverse MAV types, and various viewing angles. A new benchmark for cross-domain MAV detection is proposed based on the proposed dataset. 2) We propose a Noise Suppression Network (NSN) based on the framework of pseudo-labeling and a large-to-small training procedure. To reduce the challenging pseudo-label noises, two novel modules are designed in this network. The first is a prior-based curriculum learning module for allocating adaptive thresholds for pseudo labels with different difficulties. The second is a masked copy-paste augmentation module for pasting truly-labeled MAVs on unlabeled target images and thus decreasing pseudo-label noises. 3) Extensive experimental results verify the superior performance of the proposed method compared to the state-of-the-art ones. In particular, it achieves mAP of 46.9%(+5.8%), 50.5%(+3.7%), and 61.5%(+11.3%) on the tasks of simulation-to-real adaptation, cross-scene adaptation, and cross-camera adaptation, respectively.
Paper Structure (37 sections, 9 equations, 11 figures, 13 tables, 2 algorithms)

This paper contains 37 sections, 9 equations, 11 figures, 13 tables, 2 algorithms.

Figures (11)

  • Figure 1: Samples from the proposed Multi-MAV-Multi-Domain (M3D) dataset. The top to bottom shows examples from the M3D-Sim subset and M3D-Real subset, respectively. The backgrounds of MAVs contain many types: mountains, buildings, villages, rivers, deserts, farmlands, parks, and roads. MAVs are usually small objects in captured images. The enlarged MAVs containing diverse types are presented at the lower right of each picture.
  • Figure 2: The MAVs and environments in M3D-Sim subset.
  • Figure 3: The MAVs and environments in M3D-Real subset.
  • Figure 4: The framework of each training stage in the noise suppression network. 1) The prediction of unlabeled target data is employed as pseudo labels and corrected by the prior-guided curriculum learning module; (see Section \ref{['sec:pcl']}) 2) The unlabeled target data is augmented by the masked copy-paste augmentation module that can generate true labels; (see Section \ref{['sec:mca']}) 3) The detection model is trained by labeled source data and augmented target data for $E$ epochs.
  • Figure 5: Schematic diagram of the target area $A_\mathrm{t}$ and local background area $A_\mathrm{t}$. The yellow area and the blue area represent $A_\mathrm{t}$ and $A_\mathrm{b}$, respectively.
  • ...and 6 more figures