Table of Contents
Fetching ...

CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting

Hulingxiao He, Yaqi Zhang, Jinglin Xu, Yuxin Peng

TL;DR

The paper addresses automatic counting of plants in high-resolution overhead imagery, where plants can be distributed in arbitrary directions and processing long sequences is challenging. It introduces CountMamba, a counting framework built on Multi-directional State-Space Group (MSSG) with four directional blocks (HSSB/VSSB/DSSB/ASSB) and a Global-Local Adaptive Fusion (GLAF) that fuses global directional features with a local CNN branch, followed by a Counter/Normalizer to produce a counting map. The approach achieves competitive results across maize tassels, wheat ears, and sorghum heads, with state-of-the-art MAE/RMSE on several benchmarks, demonstrating robustness to directional distribution and high-resolution imagery. CountMamba provides a scalable, direction-aware backbone for plant counting with potential for further enhancements in scalability and fine-grained feature extraction.

Abstract

Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting. Inspired by the fact that humans count objects in high-resolution images by sequential scanning, we explore the potential of handling plant counting tasks via state space models (SSMs) for generating counting results. In this paper, we propose a new counting approach named CountMamba that constructs multiple counting experts to scan from various directions simultaneously. Specifically, we design a Multi-directional State-Space Group to process the image patch sequences in multiple orders and aim to simulate different counting experts. We also design Global-Local Adaptive Fusion to adaptively aggregate global features extracted from multiple directions and local features extracted from the CNN branch in a sample-wise manner. Extensive experiments demonstrate that the proposed CountMamba performs competitively on various plant counting tasks, including maize tassels, wheat ears, and sorghum head counting.

CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting

TL;DR

The paper addresses automatic counting of plants in high-resolution overhead imagery, where plants can be distributed in arbitrary directions and processing long sequences is challenging. It introduces CountMamba, a counting framework built on Multi-directional State-Space Group (MSSG) with four directional blocks (HSSB/VSSB/DSSB/ASSB) and a Global-Local Adaptive Fusion (GLAF) that fuses global directional features with a local CNN branch, followed by a Counter/Normalizer to produce a counting map. The approach achieves competitive results across maize tassels, wheat ears, and sorghum heads, with state-of-the-art MAE/RMSE on several benchmarks, demonstrating robustness to directional distribution and high-resolution imagery. CountMamba provides a scalable, direction-aware backbone for plant counting with potential for further enhancements in scalability and fine-grained feature extraction.

Abstract

Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting. Inspired by the fact that humans count objects in high-resolution images by sequential scanning, we explore the potential of handling plant counting tasks via state space models (SSMs) for generating counting results. In this paper, we propose a new counting approach named CountMamba that constructs multiple counting experts to scan from various directions simultaneously. Specifically, we design a Multi-directional State-Space Group to process the image patch sequences in multiple orders and aim to simulate different counting experts. We also design Global-Local Adaptive Fusion to adaptively aggregate global features extracted from multiple directions and local features extracted from the CNN branch in a sample-wise manner. Extensive experiments demonstrate that the proposed CountMamba performs competitively on various plant counting tasks, including maize tassels, wheat ears, and sorghum head counting.

Paper Structure

This paper contains 19 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An overview of the proposed CountMamba. It contains a Multi-directional State-Space Group (MSSG) comprised of stacked Horizontal State-Space Blocks (HSSBs), Vertical State-Space Blocks (VSSBs), Diagonal State-Space Blocks (DSSBs), and Anti-diagonal State-Space Blocks (ASSBs) in parallel, followed by Global-Local Adaptive Fusion, Counter and Normalizer to achieve plant counting.
  • Figure 2: Illustration of the structure of HSSM, VSSM, DSSM, ASSM, and CNN branch.
  • Figure 3: Qualitative results on MTC, WED, and SHC datasets. Manual indicates the ground-truth and Inferred the predicted count. Red points are manual annotations.