Table of Contents
Fetching ...

Class-specific Data Augmentation for Plant Stress Classification

Nasla Saleem, Aditya Balu, Talukder Zaki Jubery, Arti Singh, Asheesh K. Singh, Soumik Sarkar, Baskar Ganapathysubramanian

TL;DR

This work tackles soybean leaf stress classification under confounding class conditions by introducing class-specific data augmentation guided by a genetic algorithm. By optimizing per-class augmentation policies and fine-tuning only the linear layer of a pretrained CNN, the method achieves a mean-per-class accuracy of $97.61\%$, up from a baseline of $95.09\%$, with substantial gains for difficult classes (e.g., bacterial blight and bacterial pustule). The GA searches a $9×15$ policy space to maximize MPCA, using a concise 5-epoch fine-tuning for each candidate, enabling efficient discovery of effective augmentation strategies. The results demonstrate improved per-class performance, reduced misclassifications, and insights into augmentation preferences across biotic, abiotic, and healthy classes, suggesting practical impact for disease management and crop yield optimization. The approach offers a scalable, computation-efficient avenue for class-aware data augmentation applicable beyond soybean stresses to broader agricultural phenotyping tasks.

Abstract

Data augmentation is a powerful tool for improving deep learning-based image classifiers for plant stress identification and classification. However, selecting an effective set of augmentations from a large pool of candidates remains a key challenge, particularly in imbalanced and confounding datasets. We propose an approach for automated class-specific data augmentation using a genetic algorithm. We demonstrate the utility of our approach on soybean [Glycine max (L.) Merr] stress classification where symptoms are observed on leaves; a particularly challenging problem due to confounding classes in the dataset. Our approach yields substantial performance, achieving a mean-per-class accuracy of 97.61% and an overall accuracy of 98% on the soybean leaf stress dataset. Our method significantly improves the accuracy of the most challenging classes, with notable enhancements from 83.01% to 88.89% and from 85.71% to 94.05%, respectively. A key observation we make in this study is that high-performing augmentation strategies can be identified in a computationally efficient manner. We fine-tune only the linear layer of the baseline model with different augmentations, thereby reducing the computational burden associated with training classifiers from scratch for each augmentation policy while achieving exceptional performance. This research represents an advancement in automated data augmentation strategies for plant stress classification, particularly in the context of confounding datasets. Our findings contribute to the growing body of research in tailored augmentation techniques and their potential impact on disease management strategies, crop yields, and global food security. The proposed approach holds the potential to enhance the accuracy and efficiency of deep learning-based tools for managing plant stresses in agriculture.

Class-specific Data Augmentation for Plant Stress Classification

TL;DR

This work tackles soybean leaf stress classification under confounding class conditions by introducing class-specific data augmentation guided by a genetic algorithm. By optimizing per-class augmentation policies and fine-tuning only the linear layer of a pretrained CNN, the method achieves a mean-per-class accuracy of , up from a baseline of , with substantial gains for difficult classes (e.g., bacterial blight and bacterial pustule). The GA searches a policy space to maximize MPCA, using a concise 5-epoch fine-tuning for each candidate, enabling efficient discovery of effective augmentation strategies. The results demonstrate improved per-class performance, reduced misclassifications, and insights into augmentation preferences across biotic, abiotic, and healthy classes, suggesting practical impact for disease management and crop yield optimization. The approach offers a scalable, computation-efficient avenue for class-aware data augmentation applicable beyond soybean stresses to broader agricultural phenotyping tasks.

Abstract

Data augmentation is a powerful tool for improving deep learning-based image classifiers for plant stress identification and classification. However, selecting an effective set of augmentations from a large pool of candidates remains a key challenge, particularly in imbalanced and confounding datasets. We propose an approach for automated class-specific data augmentation using a genetic algorithm. We demonstrate the utility of our approach on soybean [Glycine max (L.) Merr] stress classification where symptoms are observed on leaves; a particularly challenging problem due to confounding classes in the dataset. Our approach yields substantial performance, achieving a mean-per-class accuracy of 97.61% and an overall accuracy of 98% on the soybean leaf stress dataset. Our method significantly improves the accuracy of the most challenging classes, with notable enhancements from 83.01% to 88.89% and from 85.71% to 94.05%, respectively. A key observation we make in this study is that high-performing augmentation strategies can be identified in a computationally efficient manner. We fine-tune only the linear layer of the baseline model with different augmentations, thereby reducing the computational burden associated with training classifiers from scratch for each augmentation policy while achieving exceptional performance. This research represents an advancement in automated data augmentation strategies for plant stress classification, particularly in the context of confounding datasets. Our findings contribute to the growing body of research in tailored augmentation techniques and their potential impact on disease management strategies, crop yields, and global food security. The proposed approach holds the potential to enhance the accuracy and efficiency of deep learning-based tools for managing plant stresses in agriculture.
Paper Structure (14 sections, 2 equations, 9 figures, 2 tables)

This paper contains 14 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Class-specific effects of augmentations: "horizontal flip" distorts a brain cell image, "vertical flip" transforms a "6" into a "9" in MNIST, and "cutout" masks disease in a soybean leaf. These instances reveal that tailored strategies are essential, as not all augmentations benefit all classes.
  • Figure 2: For different stress classes in the soybean stress (biotic and abiotic) dataset, we present an image from each category (left) and thin automating plant stress classification processes e corresponding image transformed using the three most likely augmentations (middle) and the three least likely augmentations (right) for that stress class, as determined by our class-specific automated data augmentation method.
  • Figure 3: Image examples of the nine classes (healthy leaflet and eight different soybean stresses) in the dataset.
  • Figure 4: Examples of wrongly classified images by baseline model
  • Figure 5: Illustration of a single generation in the GA framework. The baseline classifier is fine-tuned with each candidate from the GA population, which represents the probabilities of augmentations for each class. These selected candidates undergo mutation and crossover operations, generating the next generation of augmentation probabilities for improved performance.
  • ...and 4 more figures