Table of Contents
Fetching ...

DAS-SK: An Adaptive Model Integrating Dual Atrous Separable and Selective Kernel CNN for Agriculture Semantic Segmentation

Mei Ling Chee, Thangarajah Akilan, Aparna Ravindra Phalke, Kanchan Keisham

TL;DR

DAS-SK tackles the challenge of accurate semantic segmentation in high-resolution agricultural imagery while meeting real-time, edge-deployable constraints. It introduces a dual-backbone DeepLabV3-based architecture that fuses MobileNetV3-Large as the primary encoder with an auxiliary EfficientNet-B3, augmented by the DAS-SKConv module which combines dual atrous separable and standard atrous convolutions with selective-kernel attention. An enhanced ASPP module, incorporating six DAS-SKConv branches with diverse dilations and a strip-pooling path, enriches multi-scale context before a hierarchical decoder reconstructs high-resolution segmentations. Across LandCover.ai, VDD, and PhenoBench, DAS-SK delivers state-of-the-art efficiency and competitive accuracy, achieving high mIoU with substantially fewer parameters and GFLOPs than transformer-based models, thereby enabling practical deployment on UAVs and edge devices. The work demonstrates strong generalization to diverse agricultural and remote-sensing scenarios and points to future directions in self-supervised and domain-adaptive segmentation under limited labeling.

Abstract

Semantic segmentation in high-resolution agricultural imagery demands models that strike a careful balance between accuracy and computational efficiency to enable deployment in practical systems. In this work, we propose DAS-SK, a novel lightweight architecture that retrofits selective kernel convolution (SK-Conv) into the dual atrous separable convolution (DAS-Conv) module to strengthen multi-scale feature learning. The model further enhances the atrous spatial pyramid pooling (ASPP) module, enabling the capture of fine-grained local structures alongside global contextual information. Built upon a modified DeepLabV3 framework with two complementary backbones - MobileNetV3-Large and EfficientNet-B3, the DAS-SK model mitigates limitations associated with large dataset requirements, limited spectral generalization, and the high computational cost that typically restricts deployment on UAVs and other edge devices. Comprehensive experiments across three benchmarks: LandCover.ai, VDD, and PhenoBench, demonstrate that DAS-SK consistently achieves state-of-the-art performance, while being more efficient than CNN-, transformer-, and hybrid-based competitors. Notably, DAS-SK requires up to 21x fewer parameters and 19x fewer GFLOPs than top-performing transformer models. These findings establish DAS-SK as a robust, efficient, and scalable solution for real-time agricultural robotics and high-resolution remote sensing, with strong potential for broader deployment in other vision domains.

DAS-SK: An Adaptive Model Integrating Dual Atrous Separable and Selective Kernel CNN for Agriculture Semantic Segmentation

TL;DR

DAS-SK tackles the challenge of accurate semantic segmentation in high-resolution agricultural imagery while meeting real-time, edge-deployable constraints. It introduces a dual-backbone DeepLabV3-based architecture that fuses MobileNetV3-Large as the primary encoder with an auxiliary EfficientNet-B3, augmented by the DAS-SKConv module which combines dual atrous separable and standard atrous convolutions with selective-kernel attention. An enhanced ASPP module, incorporating six DAS-SKConv branches with diverse dilations and a strip-pooling path, enriches multi-scale context before a hierarchical decoder reconstructs high-resolution segmentations. Across LandCover.ai, VDD, and PhenoBench, DAS-SK delivers state-of-the-art efficiency and competitive accuracy, achieving high mIoU with substantially fewer parameters and GFLOPs than transformer-based models, thereby enabling practical deployment on UAVs and edge devices. The work demonstrates strong generalization to diverse agricultural and remote-sensing scenarios and points to future directions in self-supervised and domain-adaptive segmentation under limited labeling.

Abstract

Semantic segmentation in high-resolution agricultural imagery demands models that strike a careful balance between accuracy and computational efficiency to enable deployment in practical systems. In this work, we propose DAS-SK, a novel lightweight architecture that retrofits selective kernel convolution (SK-Conv) into the dual atrous separable convolution (DAS-Conv) module to strengthen multi-scale feature learning. The model further enhances the atrous spatial pyramid pooling (ASPP) module, enabling the capture of fine-grained local structures alongside global contextual information. Built upon a modified DeepLabV3 framework with two complementary backbones - MobileNetV3-Large and EfficientNet-B3, the DAS-SK model mitigates limitations associated with large dataset requirements, limited spectral generalization, and the high computational cost that typically restricts deployment on UAVs and other edge devices. Comprehensive experiments across three benchmarks: LandCover.ai, VDD, and PhenoBench, demonstrate that DAS-SK consistently achieves state-of-the-art performance, while being more efficient than CNN-, transformer-, and hybrid-based competitors. Notably, DAS-SK requires up to 21x fewer parameters and 19x fewer GFLOPs than top-performing transformer models. These findings establish DAS-SK as a robust, efficient, and scalable solution for real-time agricultural robotics and high-resolution remote sensing, with strong potential for broader deployment in other vision domains.
Paper Structure (22 sections, 37 equations, 13 figures, 12 tables)

This paper contains 22 sections, 37 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: The overall architecture. The backbones extract multi-scale features from the input, which are then fused and refined via the enhanced ASPP with DAS-SKConv (cf. Fig. \ref{['fig-DAS-SKConv']}). The decoder then progressively upsamples and refines the fused features with skip connections to produce accurate segmentation.
  • Figure 2: The proposed DAS-SKConv module. The DAS block combines atrous separable and standard atrous convolutions to capture both fine and broad spatial features. The SK attention mechanism adaptively weights multi-branch features through channel-wise attention, producing context-aware representations.
  • Figure 3: The enhanced ASPP module. High-dimensional backbone features are processed via parallel branches, including a $1\times1$ Conv, six DAS-SKConv with varying dilation rates, and a strip pooling branch.
  • Figure 4: Training progress of the proposed model using the configurations given in Table \ref{['table-implementation']} on LandCover.ai, VDD, and PhenoBench benchmark datasets.
  • Figure 5: Four prediction samples on LandCover.ai's test set.
  • ...and 8 more figures