Table of Contents
Fetching ...

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images

Junyu Gao, Liangliang Zhao, Xuelong Li

TL;DR

This work tackles multi-category object counting in aerial imagery by introducing NWPU-MOC, a large-scale dataset with 14 annotated categories across 3,416 RGB-NIR scenes. It proposes MCC, a density-map based framework that fuses RGB and NIR via a dual-attention fusion layer and regresses a multi-channel density map, with a novel spatial contrast loss to model inter-channel category relationships. The approach demonstrates state-of-the-art performance on MOC tasks, improves counting in dense and occluded scenes, and provides new evaluation metrics to address long-tailed category distributions. The dataset, code, and models are publicly available, enabling further research into fine-grained, multi-spectral counting in aerial imagery.

Abstract

Object counting is a hot topic in computer vision, which aims to estimate the number of objects in a given image. However, most methods only count objects of a single category for an image, which cannot be applied to scenes that need to count objects with multiple categories simultaneously, especially in aerial scenes. To this end, this paper introduces a Multi-category Object Counting (MOC) task to estimate the numbers of different objects (cars, buildings, ships, etc.) in an aerial image. Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3,416 scenes with a resolution of 1024 $\times$ 1024 pixels, and well-annotated using 14 fine-grained object categories. Besides, each scene contains RGB and Near Infrared (NIR) images, of which the NIR spectrum can provide richer characterization information compared with only the RGB spectrum. Based on NWPU-MOC, the paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR and subsequently regress multi-channel density maps corresponding to each object category. In addition, to modeling the dependency between different channels in the density map with each object category, a spatial contrast loss is designed as a penalty for overlapping predictions at the same spatial position. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared with some mainstream counting algorithms. The dataset, code and models are publicly available at https://github.com/lyongo/NWPU-MOC.

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images

TL;DR

This work tackles multi-category object counting in aerial imagery by introducing NWPU-MOC, a large-scale dataset with 14 annotated categories across 3,416 RGB-NIR scenes. It proposes MCC, a density-map based framework that fuses RGB and NIR via a dual-attention fusion layer and regresses a multi-channel density map, with a novel spatial contrast loss to model inter-channel category relationships. The approach demonstrates state-of-the-art performance on MOC tasks, improves counting in dense and occluded scenes, and provides new evaluation metrics to address long-tailed category distributions. The dataset, code, and models are publicly available, enabling further research into fine-grained, multi-spectral counting in aerial imagery.

Abstract

Object counting is a hot topic in computer vision, which aims to estimate the number of objects in a given image. However, most methods only count objects of a single category for an image, which cannot be applied to scenes that need to count objects with multiple categories simultaneously, especially in aerial scenes. To this end, this paper introduces a Multi-category Object Counting (MOC) task to estimate the numbers of different objects (cars, buildings, ships, etc.) in an aerial image. Considering the absence of a dataset for this task, a large-scale Dataset (NWPU-MOC) is collected, consisting of 3,416 scenes with a resolution of 1024 1024 pixels, and well-annotated using 14 fine-grained object categories. Besides, each scene contains RGB and Near Infrared (NIR) images, of which the NIR spectrum can provide richer characterization information compared with only the RGB spectrum. Based on NWPU-MOC, the paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR and subsequently regress multi-channel density maps corresponding to each object category. In addition, to modeling the dependency between different channels in the density map with each object category, a spatial contrast loss is designed as a penalty for overlapping predictions at the same spatial position. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared with some mainstream counting algorithms. The dataset, code and models are publicly available at https://github.com/lyongo/NWPU-MOC.
Paper Structure (33 sections, 27 equations, 13 figures, 6 tables)

This paper contains 33 sections, 27 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Sample images from some object counting datasets. (a) is a sample from a crowd-counting dataset (NWPU-Crowd) nwpucrowd. (b) is a sample from a crowd-counting dataset in the UAV view (DroneCrowd) wen2021detection. (c)-(f) are samples from a remote-sensing object counting dataset (RSOC) gao2020counting. (g) is a sample from the NWPU-MOC dataset constructed by this paper. Different from other counting datasets, NWPU-MOC provides annotations for multiple object categories within a single image.
  • Figure 2: Tree diagram about the full 14 categories (MOC-14) and the grouping of MOC-6 in the NWPU-MOC dataset (Farmlands and Pools are considered negative samples in MOC-6).
  • Figure 3: Examples of annotations in the NWPU-MOC dataset. Different categories of objects are labeled with center points of different colors (The color of object categories refer to Fig. \ref{['fig:1']}). As shown in the bottom left corner of the figure, we blurred the tree for the difficult-to-recognize by humans in the annotation, and therefore its count will be ignored in the counting algorithm.
  • Figure 4: The first column is the RBG images, the second column is the pseudo-color images of the NIR band, and the third column is the images after annotation.
  • Figure 5: Histogram of the distribution of the labeled points for each category of objects in the NWPU-MOC dataset.
  • ...and 8 more figures