Table of Contents
Fetching ...

Vision Meets Language: A RAG-Augmented YOLOv8 Framework for Coffee Disease Diagnosis and Farmer Assistance

Semanto Mondal

TL;DR

This work tackles the challenge of accurate, context-rich coffee leaf disease diagnosis with minimal pesticide use by integrating vision, language, and retrieval. It introduces a RAG-augmented YOLOv8 framework in which leaf-disease detection grounds a retrieval-augmented language model, producing grounded explanations and remediation suggestions. A BRACOL-based dataset, enhanced through expert reannotation, supports robust detection (mAP@0.5 = 0.681) and context-aware guidance, with a user-friendly Streamlit interface for farmers. The approach offers a scalable, adaptable platform that can be extended to other crops and domains, enabling reliable, knowledge-grounded decision support in precision agriculture.

Abstract

As a social being, we have an intimate bond with the environment. A plethora of things in human life, such as lifestyle, health, and food are dependent on the environment and agriculture. It comes under our responsibility to support the environment as well as agriculture. However, traditional farming practices often result in inefficient resource use and environmental challenges. To address these issues, precision agriculture has emerged as a promising approach that leverages advanced technologies to optimise agricultural processes. In this work, a hybrid approach is proposed that combines the three different potential fields of model AI: object detection, large language model (LLM), and Retrieval-Augmented Generation (RAG). In this novel framework, we have tried to combine the vision and language models to work together to identify potential diseases in the tree leaf. This study introduces a novel AI-based precision agriculture system that uses Retrieval Augmented Generation (RAG) to provide context-aware diagnoses and natural language processing (NLP) and YOLOv8 for crop disease detection. The system aims to tackle major issues with large language models (LLMs), especially hallucinations and allows for adaptive treatment plans and real-time disease detection. The system provides an easy-to-use interface to the farmers, which they can use to detect the different diseases related to coffee leaves by just submitting the image of the affected leaf the model will detect the diseases as well as suggest potential remediation methodologies which aim to lower the use of pesticides, preserving livelihoods, and encouraging environmentally friendly methods. With an emphasis on scalability, dependability, and user-friendliness, the project intends to improve RAG-integrated object detection systems for wider agricultural applications in the future.

Vision Meets Language: A RAG-Augmented YOLOv8 Framework for Coffee Disease Diagnosis and Farmer Assistance

TL;DR

This work tackles the challenge of accurate, context-rich coffee leaf disease diagnosis with minimal pesticide use by integrating vision, language, and retrieval. It introduces a RAG-augmented YOLOv8 framework in which leaf-disease detection grounds a retrieval-augmented language model, producing grounded explanations and remediation suggestions. A BRACOL-based dataset, enhanced through expert reannotation, supports robust detection (mAP@0.5 = 0.681) and context-aware guidance, with a user-friendly Streamlit interface for farmers. The approach offers a scalable, adaptable platform that can be extended to other crops and domains, enabling reliable, knowledge-grounded decision support in precision agriculture.

Abstract

As a social being, we have an intimate bond with the environment. A plethora of things in human life, such as lifestyle, health, and food are dependent on the environment and agriculture. It comes under our responsibility to support the environment as well as agriculture. However, traditional farming practices often result in inefficient resource use and environmental challenges. To address these issues, precision agriculture has emerged as a promising approach that leverages advanced technologies to optimise agricultural processes. In this work, a hybrid approach is proposed that combines the three different potential fields of model AI: object detection, large language model (LLM), and Retrieval-Augmented Generation (RAG). In this novel framework, we have tried to combine the vision and language models to work together to identify potential diseases in the tree leaf. This study introduces a novel AI-based precision agriculture system that uses Retrieval Augmented Generation (RAG) to provide context-aware diagnoses and natural language processing (NLP) and YOLOv8 for crop disease detection. The system aims to tackle major issues with large language models (LLMs), especially hallucinations and allows for adaptive treatment plans and real-time disease detection. The system provides an easy-to-use interface to the farmers, which they can use to detect the different diseases related to coffee leaves by just submitting the image of the affected leaf the model will detect the diseases as well as suggest potential remediation methodologies which aim to lower the use of pesticides, preserving livelihoods, and encouraging environmentally friendly methods. With an emphasis on scalability, dependability, and user-friendliness, the project intends to improve RAG-integrated object detection systems for wider agricultural applications in the future.

Paper Structure

This paper contains 15 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: High-Level Overview of the RAG-Based Chatbot
  • Figure 2: Workflow of YOLOv8 + RAG + LLM-Based Coffee Leaf Disease Assistant
  • Figure 3: Comparison of annotated labels and YOLOv8 prediction results for a sample input image.
  • Figure 4: End-to-end flow from image upload to disease remedy and conversational assistance.