Table of Contents
Fetching ...

Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

Haoran Zhu, Yifan Zhou, Chang Xu, Ruixiang Zhang, Wen Yang

TL;DR

The paper addresses FGOD in aerial images where fine-grained subcategories are easily confused. It proposes Orthogonal Mapping (OM), which replaces the conventional last-layer mapping with a fixed orthogonal class-prototype basis by Gram-Schmidt-orthogonalizing the pooled kernel to obtain $\tilde{K} \in \mathbb{R}^{C \times N}$ and using cosine similarity $Y = {\langle \tilde{K}, X\rangle}^T$ between inputs and class prototypes. OM is a plug-in for both one-stage FCOS and two-stage PETDet, requiring no additional learnable parameters, and yields consistent gains on FAIR1M-v1.0, MAR20, and ShipRSImageNet (e.g., +4.08% mAP on ShipRSImageNet for FCOS). Visual analyses show reduced inter-class confusion and orthogonal separation in feature space, supporting the discriminative effect of the approach. The method offers a practical, low-overhead path to improved FGOD in aerial imagery and can be adopted in existing detection pipelines.

Abstract

Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise orthogonal vector basis. This effectively mitigates semantic confusion and enhances classification accuracy. Moreover, OM can be seamlessly integrated into mainstream object detectors. Extensive experiments conducted on three FGOD datasets (FAIR1M, ShipRSImageNet, and MAR20) demonstrate the effectiveness and superiority of the proposed approach. Notably, with just one line of code, OM achieves a 4.08% improvement in mean Average Precision (mAP) over FCOS on the ShipRSImageNet dataset. Codes are released at https://github.com/ZhuHaoranEIS/Orthogonal-FGOD.

Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

TL;DR

The paper addresses FGOD in aerial images where fine-grained subcategories are easily confused. It proposes Orthogonal Mapping (OM), which replaces the conventional last-layer mapping with a fixed orthogonal class-prototype basis by Gram-Schmidt-orthogonalizing the pooled kernel to obtain and using cosine similarity between inputs and class prototypes. OM is a plug-in for both one-stage FCOS and two-stage PETDet, requiring no additional learnable parameters, and yields consistent gains on FAIR1M-v1.0, MAR20, and ShipRSImageNet (e.g., +4.08% mAP on ShipRSImageNet for FCOS). Visual analyses show reduced inter-class confusion and orthogonal separation in feature space, supporting the discriminative effect of the approach. The method offers a practical, low-overhead path to improved FGOD in aerial imagery and can be adopted in existing detection pipelines.

Abstract

Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise orthogonal vector basis. This effectively mitigates semantic confusion and enhances classification accuracy. Moreover, OM can be seamlessly integrated into mainstream object detectors. Extensive experiments conducted on three FGOD datasets (FAIR1M, ShipRSImageNet, and MAR20) demonstrate the effectiveness and superiority of the proposed approach. Notably, with just one line of code, OM achieves a 4.08% improvement in mean Average Precision (mAP) over FCOS on the ShipRSImageNet dataset. Codes are released at https://github.com/ZhuHaoranEIS/Orthogonal-FGOD.
Paper Structure (11 sections, 2 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 2 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: A toy example of proposed Orthogonal Mapping (OM). Concretely, OM situates distinct fine-grained subcategories in an orthogonal space. As illustrated in the figure, Boeing 737, Boeing 747, and Boeing 787 are positioned in the $S_{xoz}$, $S_{yoz}$, and $S_{xoy}$ spaces, respectively.
  • Figure 2: Confusion matrices of detection results (%) obtained from FCOS w/o OM (top) and FCOS w/ OM (bottom). The horizontal and vertical coordinates represent the ground truth labels and the prediction labels. (a) Airplane. (b) Ship. (c) Vehicle.
  • Figure 3: The three-dimensional distribution of three main easily confused classes in FAIR1M-v1.0 obtained from FCOS w/o OM (top) and FCOS w/ OM (bottom). (a) Airplane. (b) Ship. (c) Vehicle.