Table of Contents
Fetching ...

Task-wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial Images

Zhanchao Huang, Wei Li, Xiang-Gen Xia, Hao Wang, Ran Tao

TL;DR

An AOOD method called task-wise sampling convolutions (TS-Conv) is proposed, which adaptively samples task-wise features from respective sensitive regions and maps these features together in alignment to guide a dynamic label assignment for better predictions.

Abstract

Arbitrary-oriented object detection (AOOD) has been widely applied to locate and classify objects with diverse orientations in remote sensing images. However, the inconsistent features for the localization and classification tasks in AOOD models may lead to ambiguity and low-quality object predictions, which constrains the detection performance. In this article, an AOOD method called task-wise sampling convolutions (TS-Conv) is proposed. TS-Conv adaptively samples task-wise features from respective sensitive regions and maps these features together in alignment to guide a dynamic label assignment for better predictions. Specifically, sampling positions of the localization convolution in TS-Conv are supervised by the oriented bounding box (OBB) prediction associated with spatial coordinates, while sampling positions and convolutional kernel of the classification convolution are designed to be adaptively adjusted according to different orientations for improving the orientation robustness of features. Furthermore, a dynamic task-consistent-aware label assignment (DTLA) strategy is developed to select optimal candidate positions and assign labels dynamically according to ranked task-aware scores obtained from TS-Conv. Extensive experiments on several public datasets covering multiple scenes, multimodal images, and multiple categories of objects demonstrate the effectiveness, scalability, and superior performance of the proposed TS-Conv.

Task-wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial Images

TL;DR

An AOOD method called task-wise sampling convolutions (TS-Conv) is proposed, which adaptively samples task-wise features from respective sensitive regions and maps these features together in alignment to guide a dynamic label assignment for better predictions.

Abstract

Arbitrary-oriented object detection (AOOD) has been widely applied to locate and classify objects with diverse orientations in remote sensing images. However, the inconsistent features for the localization and classification tasks in AOOD models may lead to ambiguity and low-quality object predictions, which constrains the detection performance. In this article, an AOOD method called task-wise sampling convolutions (TS-Conv) is proposed. TS-Conv adaptively samples task-wise features from respective sensitive regions and maps these features together in alignment to guide a dynamic label assignment for better predictions. Specifically, sampling positions of the localization convolution in TS-Conv are supervised by the oriented bounding box (OBB) prediction associated with spatial coordinates, while sampling positions and convolutional kernel of the classification convolution are designed to be adaptively adjusted according to different orientations for improving the orientation robustness of features. Furthermore, a dynamic task-consistent-aware label assignment (DTLA) strategy is developed to select optimal candidate positions and assign labels dynamically according to ranked task-aware scores obtained from TS-Conv. Extensive experiments on several public datasets covering multiple scenes, multimodal images, and multiple categories of objects demonstrate the effectiveness, scalability, and superior performance of the proposed TS-Conv.
Paper Structure (19 sections, 20 equations, 20 figures, 13 tables, 1 algorithm)

This paper contains 19 sections, 20 equations, 20 figures, 13 tables, 1 algorithm.

Figures (20)

  • Figure 1: (a) The IFS problems of AOOD in different subtasks and orientations. (b) The IFS problem is exacerbated by the diverse orientation and dense distribution of objects.
  • Figure 2: The TS-Conv framework comprises (a) the CNN model and training data, (b) the proposed TS-Conv consisting of LS-Conv and CS-Conv, (c) the designed DTLA strategy. The baseline CNN model used by TS-Conv and GGHL is Darknet53+FPN. GGHL in the following (when the label assignment strategy is not emphasized) refers to the GGHL-based Darknet53+FPN model.
  • Figure 3: The principle of the proposed task-wise sampling convolutions (TS-Conv). (a) The OBB representation of GGHL huang2022general. (b) The CNN structure of the proposed TS-Conv. (c) The sampling positions of the convolution for sampling localization features (LS-Conv). (d) The sampling positions of the convolution for sampling classification features (CS-Conv).
  • Figure 4: The designed dynamic circluar kernel (DCK). (a) Adaptive fusion of circular and square convolutional kernels. (b) Adaptive fusion of eight-rotation features.
  • Figure 5: The label assignment strategy for AOOD: (a) GGHL huang2022general and (b) DTLA. The designed DTLA consists of (b-1) the dynamic positive candidate position assignment based on task-consistent-aware scores and (b-2) the soft-weighted negative candidate position assignment.
  • ...and 15 more figures