Table of Contents
Fetching ...

Agriculture-Vision Challenge 2024 -- The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble

Wang Liu, Zhiyu Wang, Puhong Duan, Xudong Kang, Shutao Li

TL;DR

The paper tackles severe class imbalance in the Agriculture-Vision dataset to improve pixel-level semantic segmentation of RGB-NIR aerial imagery. It introduces a three-pronged approach: mosaic augmentation with rare-class sampling, an adaptive class-weighted loss (ACWLoss), and a probability post-process to boost rare-class predictions, complemented by test-time augmentation and ensemble fusion across multiple architectures. Empirical results show progressive gains from each component, with a final ensemble achieving $mIoU = 0.547$ on the test set, securing runner-up. This work demonstrates practical strategy for balancing influence across classes in agricultural remote sensing, offering a transferable framework for similar imbalanced segmentation tasks.

Abstract

The Agriculture-Vision Challenge at CVPR 2024 aims at leveraging semantic segmentation models to produce pixel level semantic segmentation labels within regions of interest for multi-modality satellite images. It is one of the most famous and competitive challenges for global researchers to break the boundary between computer vision and agriculture sectors. However, there is a serious class imbalance problem in the agriculture-vision dataset, which hinders the semantic segmentation performance. To solve this problem, firstly, we propose a mosaic data augmentation with a rare class sampling strategy to enrich long-tail class samples. Secondly, we employ an adaptive class weight scheme to suppress the contribution of the common classes while increasing the ones of rare classes. Thirdly, we propose a probability post-process to increase the predicted value of the rare classes. Our methodology achieved a mean Intersection over Union (mIoU) score of 0.547 on the test set, securing second place in this challenge.

Agriculture-Vision Challenge 2024 -- The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble

TL;DR

The paper tackles severe class imbalance in the Agriculture-Vision dataset to improve pixel-level semantic segmentation of RGB-NIR aerial imagery. It introduces a three-pronged approach: mosaic augmentation with rare-class sampling, an adaptive class-weighted loss (ACWLoss), and a probability post-process to boost rare-class predictions, complemented by test-time augmentation and ensemble fusion across multiple architectures. Empirical results show progressive gains from each component, with a final ensemble achieving on the test set, securing runner-up. This work demonstrates practical strategy for balancing influence across classes in agricultural remote sensing, offering a transferable framework for similar imbalanced segmentation tasks.

Abstract

The Agriculture-Vision Challenge at CVPR 2024 aims at leveraging semantic segmentation models to produce pixel level semantic segmentation labels within regions of interest for multi-modality satellite images. It is one of the most famous and competitive challenges for global researchers to break the boundary between computer vision and agriculture sectors. However, there is a serious class imbalance problem in the agriculture-vision dataset, which hinders the semantic segmentation performance. To solve this problem, firstly, we propose a mosaic data augmentation with a rare class sampling strategy to enrich long-tail class samples. Secondly, we employ an adaptive class weight scheme to suppress the contribution of the common classes while increasing the ones of rare classes. Thirdly, we propose a probability post-process to increase the predicted value of the rare classes. Our methodology achieved a mean Intersection over Union (mIoU) score of 0.547 on the test set, securing second place in this challenge.
Paper Structure (9 sections, 2 figures, 2 tables)

This paper contains 9 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The number of pixels in each class.
  • Figure 2: This data flow of our final solution.