Class-specific Data Augmentation for Plant Stress Classification
Nasla Saleem, Aditya Balu, Talukder Zaki Jubery, Arti Singh, Asheesh K. Singh, Soumik Sarkar, Baskar Ganapathysubramanian
TL;DR
This work tackles soybean leaf stress classification under confounding class conditions by introducing class-specific data augmentation guided by a genetic algorithm. By optimizing per-class augmentation policies and fine-tuning only the linear layer of a pretrained CNN, the method achieves a mean-per-class accuracy of $97.61\%$, up from a baseline of $95.09\%$, with substantial gains for difficult classes (e.g., bacterial blight and bacterial pustule). The GA searches a $9×15$ policy space to maximize MPCA, using a concise 5-epoch fine-tuning for each candidate, enabling efficient discovery of effective augmentation strategies. The results demonstrate improved per-class performance, reduced misclassifications, and insights into augmentation preferences across biotic, abiotic, and healthy classes, suggesting practical impact for disease management and crop yield optimization. The approach offers a scalable, computation-efficient avenue for class-aware data augmentation applicable beyond soybean stresses to broader agricultural phenotyping tasks.
Abstract
Data augmentation is a powerful tool for improving deep learning-based image classifiers for plant stress identification and classification. However, selecting an effective set of augmentations from a large pool of candidates remains a key challenge, particularly in imbalanced and confounding datasets. We propose an approach for automated class-specific data augmentation using a genetic algorithm. We demonstrate the utility of our approach on soybean [Glycine max (L.) Merr] stress classification where symptoms are observed on leaves; a particularly challenging problem due to confounding classes in the dataset. Our approach yields substantial performance, achieving a mean-per-class accuracy of 97.61% and an overall accuracy of 98% on the soybean leaf stress dataset. Our method significantly improves the accuracy of the most challenging classes, with notable enhancements from 83.01% to 88.89% and from 85.71% to 94.05%, respectively. A key observation we make in this study is that high-performing augmentation strategies can be identified in a computationally efficient manner. We fine-tune only the linear layer of the baseline model with different augmentations, thereby reducing the computational burden associated with training classifiers from scratch for each augmentation policy while achieving exceptional performance. This research represents an advancement in automated data augmentation strategies for plant stress classification, particularly in the context of confounding datasets. Our findings contribute to the growing body of research in tailored augmentation techniques and their potential impact on disease management strategies, crop yields, and global food security. The proposed approach holds the potential to enhance the accuracy and efficiency of deep learning-based tools for managing plant stresses in agriculture.
