Knowledge Synthesis of Photosynthesis Research Using a Large Language Model

Seungri Yoon; Woosang Jeon; Sanghyeok Choi; Taehyeong Kim; Tae In Ahn

Knowledge Synthesis of Photosynthesis Research Using a Large Language Model

Seungri Yoon, Woosang Jeon, Sanghyeok Choi, Taehyeong Kim, Tae In Ahn

TL;DR

The paper addresses the cognitive burden and information overload in photosynthesis research by introducing PRAG, a GPT-4o-based photosynthesis research assistant that uses retrieval-augmented generation and prompt optimization. PRAG integrates a vector database, an automated feedback loop, and a knowledge graph to structure responses, enabling hypothesis generation and knowledge synthesis with paper-level transparency. It reports an average improvement of $8.7$ across five scientific-writing metrics and a $25.4$ increase in source transparency over a baseline, with scientific depth and domain coverage approaching that of published papers. The approach yields robust, domain-spanning insights and is demonstrated through extensive evaluation and visualization, with code publicly available for broader plant science applications and future multimodal enhancements.

Abstract

The development of biological data analysis tools and large language models (LLMs) has opened up new possibilities for utilizing AI in plant science research, with the potential to contribute significantly to knowledge integration and research gap identification. Nonetheless, current LLMs struggle to handle complex biological data and theoretical models in photosynthesis research and often fail to provide accurate scientific contexts. Therefore, this study proposed a photosynthesis research assistant (PRAG) based on OpenAI's GPT-4o with retrieval-augmented generation (RAG) techniques and prompt optimization. Vector databases and an automated feedback loop were used in the prompt optimization process to enhance the accuracy and relevance of the responses to photosynthesis-related queries. PRAG showed an average improvement of 8.7% across five metrics related to scientific writing, with a 25.4% increase in source transparency. Additionally, its scientific depth and domain coverage were comparable to those of photosynthesis research papers. A knowledge graph was used to structure PRAG's responses with papers within and outside the database, which allowed PRAG to match key entities with 63% and 39.5% of the database and test papers, respectively. PRAG can be applied for photosynthesis research and broader plant science domains, paving the way for more in-depth data analysis and predictive capabilities.

Knowledge Synthesis of Photosynthesis Research Using a Large Language Model

TL;DR

Abstract

Knowledge Synthesis of Photosynthesis Research Using a Large Language Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)