Efficient Leaf Disease Classification and Segmentation using Midpoint Normalization Technique and Attention Mechanism
Enam Ahmed Taufik, Antara Firoz Parsa, Seraj Al Mahmud Mostafa
TL;DR
The paper addresses reliable leaf disease detection and localization under limited labeled data and computational constraints. It introduces a two-stage pipeline that combines Mid Point Normalization preprocessing, which maps pixel values to the range $[-1,1]$ via $MPN = \tanh( image_{resized} / 127.5 - 1.0 )$, with attention-based channel recalibration using Squeeze-and-Excitation blocks. The authors present SE-ConvNet for classification and integrate SE blocks into U-Net to improve segmentation, reporting 93% classification accuracy and IoU 58.54%, Dice 72.44% on a Betel leaf dataset. The results demonstrate that tailored preprocessing plus lightweight, attention-enhanced architectures can surpass baseline methods while maintaining computational efficiency, enabling deployment in resource-constrained agricultural settings and adaptability to other domains.
Abstract
Enhancing plant disease detection from leaf imagery remains a persistent challenge due to scarce labeled data and complex contextual factors. We introduce a transformative two-stage methodology, Mid Point Normalization (MPN) for intelligent image preprocessing, coupled with sophisticated attention mechanisms that dynamically recalibrate feature representations. Our classification pipeline, merging MPN with Squeeze-and-Excitation (SE) blocks, achieves remarkable 93% accuracy while maintaining exceptional class-wise balance. The perfect F1 score attained for our target class exemplifies attention's power in adaptive feature refinement. For segmentation tasks, we seamlessly integrate identical attention blocks within U-Net architecture using MPN-enhanced inputs, delivering compelling performance gains with 72.44% Dice score and 58.54% IoU, substantially outperforming baseline implementations. Beyond superior accuracy metrics, our approach yields computationally efficient, lightweight architectures perfectly suited for real-world computer vision applications.
