A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

Xiang Liu; Zhaoxiang Liu; Huan Hu; Zezhou Chen; Kohou Wang; Kai Wang; Shiguo Lian

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

Xiang Liu, Zhaoxiang Liu, Huan Hu, Zezhou Chen, Kohou Wang, Kai Wang, Shiguo Lian

TL;DR

The paper addresses the need for accurate, knowledge-rich crop disease diagnosis using multimodal AI. It introduces the Crop Disease Domain Multimodal (CDDM) dataset, comprising 137k images and 1M QA pairs spanning diagnosis and knowledge, and a LoRA-based finetuning strategy that updates the visual encoder, adapter, and language model to adapt LVLMs like Qwen-VL-Chat to agriculture. Experiments show that models finetuned on CDDM outperform baselines on both diagnosis accuracy and knowledge QA, highlighting the value of domain-specific instruction-following data. By releasing the dataset and code, the work provides a practical resource to accelerate development of farmers' decision-support tools and advances in agricultural multimodal AI. It bridges cutting-edge vision-language models with domain-specific agricultural needs.

Abstract

While conversational generative AI has shown considerable potential in enhancing decision-making for agricultural professionals, its exploration has predominantly been anchored in text-based interactions. The evolution of multimodal conversational AI, leveraging vast amounts of image-text data from diverse sources, marks a significant stride forward. However, the application of such advanced vision-language models in the agricultural domain, particularly for crop disease diagnosis, remains underexplored. In this work, we present the crop disease domain multimodal (CDDM) dataset, a pioneering resource designed to advance the field of agricultural research through the application of multimodal learning techniques. The dataset comprises 137,000 images of various crop diseases, accompanied by 1 million question-answer pairs that span a broad spectrum of agricultural knowledge, from disease identification to management practices. By integrating visual and textual data, CDDM facilitates the development of sophisticated question-answering systems capable of providing precise, useful advice to farmers and agricultural professionals. We demonstrate the utility of the dataset by finetuning state-of-the-art multimodal models, showcasing significant improvements in crop disease diagnosis. Specifically, we employed a novel finetuning strategy that utilizes low-rank adaptation (LoRA) to finetune the visual encoder, adapter and language model simultaneously. Our contributions include not only the dataset but also a finetuning strategy and a benchmark to stimulate further research in agricultural technology, aiming to bridge the gap between advanced AI techniques and practical agricultural applications. The dataset is available at https: //github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench.

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

TL;DR

Abstract

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)