DDD: Discriminative Difficulty Distance for plant disease diagnosis

Yuji Arima; Satoshi Kagiwada; Hitoshi Iyatomi

DDD: Discriminative Difficulty Distance for plant disease diagnosis

Yuji Arima, Satoshi Kagiwada, Hitoshi Iyatomi

TL;DR

This study investigated multiple image encoders trained on different datasets and examined whether the distances between datasets, measured using low-dimensional representations generated by the encoders, are suitable as a DDD metric, a novel metric designed to quantify the domain gap between training and test datasets while assessing the classification difficulty of test data.

Abstract

Recent studies on plant disease diagnosis using machine learning (ML) have highlighted concerns about the overestimated diagnostic performance due to inappropriate data partitioning, where training and test datasets are derived from the same source (domain). Plant disease diagnosis presents a challenging classification task, characterized by its fine-grained nature, vague symptoms, and the extensive variability of image features within each domain. In this study, we propose the concept of Discriminative Difficulty Distance (DDD), a novel metric designed to quantify the domain gap between training and test datasets while assessing the classification difficulty of test data. DDD provides a valuable tool for identifying insufficient diversity in training data, thus supporting the development of more diverse and robust datasets. We investigated multiple image encoders trained on different datasets and examined whether the distances between datasets, measured using low-dimensional representations generated by the encoders, are suitable as a DDD metric. The study utilized 244,063 plant disease images spanning four crops and 34 disease classes collected from 27 domains. As a result, we demonstrated that even if the test images are from different crops or diseases than those used to train the encoder, incorporating them allows the construction of a distance measure for a dataset that strongly correlates with the difficulty of diagnosis indicated by the disease classifier developed independently. Compared to the base encoder, pre-trained only on ImageNet21K, the correlation higher by 0.106 to 0.485, reaching a maximum of 0.909.

DDD: Discriminative Difficulty Distance for plant disease diagnosis

TL;DR

Abstract

Paper Structure (25 sections, 4 equations, 2 figures, 3 tables)

This paper contains 25 sections, 4 equations, 2 figures, 3 tables.

Introduction
Related Works
Metric learning
Contrastive Learning
Distance between datasets
Discriminative Difficulty Distance (DDD)
Significance
Implementation Policy
Implementation and evaluation of a reasonable distance as DDD in the plant disease diagnosis task
(step 1) Acquisition of a low-dimensional representation for each dataset
(step 2) Calculation of the average class vectors of test data
(step 3) Calculation of the diagnostic distance $L_{ij}$ and the diagnostic similarity $S_{ij}$
(step 4) Validation of diagnostic similarity $S$
Notes
Experiments
...and 10 more sections

Figures (2)

Figure 1: Comparison of the confusion matrix ($P$: left-most column) for each crop diagnosis by $M_C$ and the diagnostic similarity ($S_{ij}$: remaining columns) between both data sets generated by each $M_E$. Dark-color indicate high values. The results in the dashed boxed area are for reference only, as the part of training data for $M_C$ and $M_E$ are shared.
Figure 2: The dependence of correlation $R$ on hyperparameter $\alpha$.

DDD: Discriminative Difficulty Distance for plant disease diagnosis

TL;DR

Abstract

DDD: Discriminative Difficulty Distance for plant disease diagnosis

Authors

TL;DR

Abstract

Table of Contents

Figures (2)