Table of Contents
Fetching ...

Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery

Ang He, Ximei Wu, Xing Xu, Jing Chen, Xiaobin Guo, Sheng Xu

TL;DR

The paper tackles the challenge of accurate banana plantation segmentation in UAV imagery, where obtaining large annotated datasets is costly. It introduces an iterative optimization annotation pipeline that leverages the zero-shot capabilities of SAM2 to generate high-quality segmentation masks with minimal manual prompts, reducing labeling effort substantially. It also presents ALSS-YOLO-Seg, a lightweight segmentation model incorporating Adaptive Lightweight Channel Splitting (ALSS) and Multi-Scale Channel Attention (MSCA) to achieve strong segmentation performance under resource constraints. Experimental results on ADE20K, Javeri, and a custom Banana Plantation dataset show robust zero-shot generalization, effective data-annotation reduction, and state-of-the-art-like accuracy with only about 1.83M parameters, highlighting practical viability for UAV-based agricultural monitoring.

Abstract

Precise segmentation of Unmanned Aerial Vehicle (UAV)-captured images plays a vital role in tasks such as crop yield estimation and plant health assessment in banana plantations. By identifying and classifying planted areas, crop area can be calculated, which is indispensable for accurate yield predictions. However, segmenting banana plantation scenes requires a substantial amount of annotated data, and manual labeling of these images is both time-consuming and labor-intensive, limiting the development of large-scale datasets. Furthermore, challenges such as changing target sizes, complex ground backgrounds, limited computational resources, and correct identification of crop categories make segmentation even more difficult. To address these issues, we proposed a comprehensive solution. Firstly, we designed an iterative optimization annotation pipeline leveraging SAM2's zero-shot capabilities to generate high-quality segmentation annotations, thereby reducing the cost and time associated with data annotation significantly. Secondly, we developed ALSS-YOLO-Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels and optimize feature extraction, aiding accurate crop identification. Additionally, a Multi-Scale Channel Attention (MSCA) module combines multi-scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.

Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery

TL;DR

The paper tackles the challenge of accurate banana plantation segmentation in UAV imagery, where obtaining large annotated datasets is costly. It introduces an iterative optimization annotation pipeline that leverages the zero-shot capabilities of SAM2 to generate high-quality segmentation masks with minimal manual prompts, reducing labeling effort substantially. It also presents ALSS-YOLO-Seg, a lightweight segmentation model incorporating Adaptive Lightweight Channel Splitting (ALSS) and Multi-Scale Channel Attention (MSCA) to achieve strong segmentation performance under resource constraints. Experimental results on ADE20K, Javeri, and a custom Banana Plantation dataset show robust zero-shot generalization, effective data-annotation reduction, and state-of-the-art-like accuracy with only about 1.83M parameters, highlighting practical viability for UAV-based agricultural monitoring.

Abstract

Precise segmentation of Unmanned Aerial Vehicle (UAV)-captured images plays a vital role in tasks such as crop yield estimation and plant health assessment in banana plantations. By identifying and classifying planted areas, crop area can be calculated, which is indispensable for accurate yield predictions. However, segmenting banana plantation scenes requires a substantial amount of annotated data, and manual labeling of these images is both time-consuming and labor-intensive, limiting the development of large-scale datasets. Furthermore, challenges such as changing target sizes, complex ground backgrounds, limited computational resources, and correct identification of crop categories make segmentation even more difficult. To address these issues, we proposed a comprehensive solution. Firstly, we designed an iterative optimization annotation pipeline leveraging SAM2's zero-shot capabilities to generate high-quality segmentation annotations, thereby reducing the cost and time associated with data annotation significantly. Secondly, we developed ALSS-YOLO-Seg, an efficient lightweight segmentation model optimized for UAV imagery. The model's backbone includes an Adaptive Lightweight Channel Splitting and Shuffling (ALSS) module to improve information exchange between channels and optimize feature extraction, aiding accurate crop identification. Additionally, a Multi-Scale Channel Attention (MSCA) module combines multi-scale feature extraction with channel attention to tackle challenges of varying target sizes and complex ground backgrounds.

Paper Structure

This paper contains 17 sections, 26 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Captured banana plantation images obtained with the DJI Phantom 3M illustrate various scenarios, including: (A) a tree and house occlusion scene, (B) a complex scene with abundant weeds, and (C, D) variations in target sizes due to differing flight altitudes of the UAV, where (C) corresponds to a flight altitude of 12 meters and (D) corresponds to a flight altitude of 5 meters.
  • Figure 2: Comparison of segmentation results using the SAM2-b model with different prompting strategies: (A) Segmentation results with bounding box prompts; (B) Segmentation results with positive point prompts; (C) Segmentation results using the segment-anything mode without any prompts. The results demonstrate that the SAM2-b model achieves more accurate segmentation when guided by prompts, while the segment-anything mode alone fails to produce satisfactory segmentation results.
  • Figure 3: The proposed iterative optimization annotation pipeline for segmentation mask generation
  • Figure 4: The architecture of the ALSS-YOLO-Seg segmenter. CBS denotes Convolution, Batch Normalization, and SiLU activation function. The symbol “k” represents the Kernel size, “s” denotes the Stride, and “p” indicates the Padding.
  • Figure 5: MSCA module structure diagram.
  • ...and 10 more figures