Table of Contents
Fetching ...

Evaluating Data Augmentation Techniques for Coffee Leaf Disease Classification

Adrian Gheorghiu, Iulian-Marius Tăiatu, Dumitru-Clementin Cercel, Iuliana Marin, Florin Pop

TL;DR

This work addresses robust classification of Robusta coffee leaf diseases using the RoCoLe dataset, which is small and imbalanced. It combines pix2pix-based segmentation, CycleGAN-driven offline augmentation, and online augmentations with Transformer-based classifiers to boost performance beyond CNN baselines. The findings show that augmentation and Transformer models improve accuracy and robustness, while synthetic data partially supplements real data but does not fully replicate its distribution. The approach advances plant disease detection with practical implications for robust, data-efficient disease diagnosis in crops, and suggests avenues for more advanced generative and segmentation techniques.

Abstract

The detection and classification of diseases in Robusta coffee leaves are essential to ensure that plants are healthy and the crop yield is kept high. However, this job requires extensive botanical knowledge and much wasted time. Therefore, this task and others similar to it have been extensively researched subjects in image classification. Regarding leaf disease classification, most approaches have used the more popular PlantVillage dataset while completely disregarding other datasets, like the Robusta Coffee Leaf (RoCoLe) dataset. As the RoCoLe dataset is imbalanced and does not have many samples, fine-tuning of pre-trained models and multiple augmentation techniques need to be used. The current paper uses the RoCoLe dataset and approaches based on deep learning for classifying coffee leaf diseases from images, incorporating the pix2pix model for segmentation and cycle-generative adversarial network (CycleGAN) for augmentation. Our study demonstrates the effectiveness of Transformer-based models, online augmentations, and CycleGAN augmentation in improving leaf disease classification. While synthetic data has limitations, it complements real data, enhancing model performance. These findings contribute to developing robust techniques for plant disease detection and classification.

Evaluating Data Augmentation Techniques for Coffee Leaf Disease Classification

TL;DR

This work addresses robust classification of Robusta coffee leaf diseases using the RoCoLe dataset, which is small and imbalanced. It combines pix2pix-based segmentation, CycleGAN-driven offline augmentation, and online augmentations with Transformer-based classifiers to boost performance beyond CNN baselines. The findings show that augmentation and Transformer models improve accuracy and robustness, while synthetic data partially supplements real data but does not fully replicate its distribution. The approach advances plant disease detection with practical implications for robust, data-efficient disease diagnosis in crops, and suggests avenues for more advanced generative and segmentation techniques.

Abstract

The detection and classification of diseases in Robusta coffee leaves are essential to ensure that plants are healthy and the crop yield is kept high. However, this job requires extensive botanical knowledge and much wasted time. Therefore, this task and others similar to it have been extensively researched subjects in image classification. Regarding leaf disease classification, most approaches have used the more popular PlantVillage dataset while completely disregarding other datasets, like the Robusta Coffee Leaf (RoCoLe) dataset. As the RoCoLe dataset is imbalanced and does not have many samples, fine-tuning of pre-trained models and multiple augmentation techniques need to be used. The current paper uses the RoCoLe dataset and approaches based on deep learning for classifying coffee leaf diseases from images, incorporating the pix2pix model for segmentation and cycle-generative adversarial network (CycleGAN) for augmentation. Our study demonstrates the effectiveness of Transformer-based models, online augmentations, and CycleGAN augmentation in improving leaf disease classification. While synthetic data has limitations, it complements real data, enhancing model performance. These findings contribute to developing robust techniques for plant disease detection and classification.
Paper Structure (26 sections, 3 equations, 8 figures, 5 tables)

This paper contains 26 sections, 3 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Statistics of the number of samples in the relabeled dataset.
  • Figure 2: Examples of rescaled images from each class and the associated masks.
  • Figure 3: Examples of pix2pix predicted masks compared to the ground truth masks.
  • Figure 4: Examples of diseased leaf images generated from healthy leaf images.
  • Figure 5: 2D t-SNE representations of each class's images, both synthetic and real.
  • ...and 3 more figures