Table of Contents
Fetching ...

ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources

Shuting Yang, Zehui Liu, Wolfgang Mayer

TL;DR

This work tackles the gap in domain-specific knowledge for agricultural LLMs by introducing ShizishanGPT, a modular QA system built on Retrieval Augmented Generation (RAG) and an agent framework. The architecture combines a generic GPT-4 component, a search module, AgriKG knowledge graphs, a retrieval module, and external domain tools to support tasks such as maize phenotype prediction and promoter enrichment analysis. Evaluations on a 100-question agricultural dataset show that ShizishanGPT achieves superior accuracy and consistency (BLEU, ROUGE, GLEU) and outperforms several baselines, including ChatGPT-4, with high manual-scoring reliability. The results demonstrate the value of integrating knowledge graphs, retrieval, and external tools for robust agricultural decision support, with public code and data provided for reproducibility.

Abstract

Recent developments in large language models (LLMs) have led to significant improvements in intelligent dialogue systems'ability to handle complex inquiries. However, current LLMs still exhibit limitations in specialized domain knowledge, particularly in technical fields such as agriculture. To address this problem, we propose ShizishanGPT, an intelligent question answering system for agriculture based on the Retrieval Augmented Generation (RAG) framework and agent architecture. ShizishanGPT consists of five key modules: including a generic GPT-4 based module for answering general questions; a search engine module that compensates for the problem that the large language model's own knowledge cannot be updated in a timely manner; an agricultural knowledge graph module for providing domain facts; a retrieval module which uses RAG to supplement domain knowledge; and an agricultural agent module, which invokes specialized models for crop phenotype prediction, gene expression analysis, and so on. We evaluated the ShizishanGPT using a dataset containing 100 agricultural questions specially designed for this study. The experimental results show that the tool significantly outperforms general LLMs as it provides more accurate and detailed answers due to its modular design and integration of different domain knowledge sources. Our source code, dataset, and model weights are publicly available at https://github.com/Zaiwen/CropGPT.

ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources

TL;DR

This work tackles the gap in domain-specific knowledge for agricultural LLMs by introducing ShizishanGPT, a modular QA system built on Retrieval Augmented Generation (RAG) and an agent framework. The architecture combines a generic GPT-4 component, a search module, AgriKG knowledge graphs, a retrieval module, and external domain tools to support tasks such as maize phenotype prediction and promoter enrichment analysis. Evaluations on a 100-question agricultural dataset show that ShizishanGPT achieves superior accuracy and consistency (BLEU, ROUGE, GLEU) and outperforms several baselines, including ChatGPT-4, with high manual-scoring reliability. The results demonstrate the value of integrating knowledge graphs, retrieval, and external tools for robust agricultural decision support, with public code and data provided for reproducibility.

Abstract

Recent developments in large language models (LLMs) have led to significant improvements in intelligent dialogue systems'ability to handle complex inquiries. However, current LLMs still exhibit limitations in specialized domain knowledge, particularly in technical fields such as agriculture. To address this problem, we propose ShizishanGPT, an intelligent question answering system for agriculture based on the Retrieval Augmented Generation (RAG) framework and agent architecture. ShizishanGPT consists of five key modules: including a generic GPT-4 based module for answering general questions; a search engine module that compensates for the problem that the large language model's own knowledge cannot be updated in a timely manner; an agricultural knowledge graph module for providing domain facts; a retrieval module which uses RAG to supplement domain knowledge; and an agricultural agent module, which invokes specialized models for crop phenotype prediction, gene expression analysis, and so on. We evaluated the ShizishanGPT using a dataset containing 100 agricultural questions specially designed for this study. The experimental results show that the tool significantly outperforms general LLMs as it provides more accurate and detailed answers due to its modular design and integration of different domain knowledge sources. Our source code, dataset, and model weights are publicly available at https://github.com/Zaiwen/CropGPT.
Paper Structure (26 sections, 6 equations, 3 figures, 3 tables)

This paper contains 26 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Example of ChatGPT versus ShizishanGPT in predicting maize gene promoter enrichment values
  • Figure 2: Detailed Architecture Diagram of the Question Answering Pipeline
  • Figure 3: Comparative Analysis of Language Model Scores on BLEU, ROUGE, GLEU, and Composite Metrics