Table of Contents
Fetching ...

Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research

Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, Tong-yi Zhang

TL;DR

The paper tackles the rapid expansion of perovskite solar cell literature by building a domain-specific knowledge graph (Perovskite-KG) from 1,517 papers and creating two high-quality instruction-tuning datasets via a multi-agent pipeline. It then trains two specialized LLMs, Perovskite-Chat-LLM and Perovskite-Reasoning-LLM, demonstrating superior domain knowledge retrieval and scientific reasoning compared with strong baselines, and further improves answers through retrieval-augmented generation with LightRAG. The work delivers practical tools for literature review, experimental design, and complex problem solving in PSC research, illustrating how knowledge graphs and domain-focused LLMs can accelerate materials discovery. Overall, this knowledge-enhanced framework represents a scalable approach to organizing domain knowledge and enabling precise, citation-backed reasoning in materials science.

Abstract

The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.

Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research

TL;DR

The paper tackles the rapid expansion of perovskite solar cell literature by building a domain-specific knowledge graph (Perovskite-KG) from 1,517 papers and creating two high-quality instruction-tuning datasets via a multi-agent pipeline. It then trains two specialized LLMs, Perovskite-Chat-LLM and Perovskite-Reasoning-LLM, demonstrating superior domain knowledge retrieval and scientific reasoning compared with strong baselines, and further improves answers through retrieval-augmented generation with LightRAG. The work delivers practical tools for literature review, experimental design, and complex problem solving in PSC research, illustrating how knowledge graphs and domain-focused LLMs can accelerate materials discovery. Overall, this knowledge-enhanced framework represents a scalable approach to organizing domain knowledge and enabling precise, citation-backed reasoning in materials science.

Abstract

The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.

Paper Structure

This paper contains 26 sections, 7 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: The pipeline of Perovskite-KG construction and Perovskite-LLM.
  • Figure 2: The distribution of question categories in the instruction tuning dataset.
  • Figure 3: Comparison of responses between Perovskite-LLM and ChatGPT: Perovskite-LLM provides detailed operational steps with specific parameters, while ChatGPT only offers general conceptual guidance.
  • Figure 4: A case study of Perovskite-Chat-LLM's ability to provide detailed and accurate information with references.
  • Figure 5: Distribution of prompt and response lengths across different categories in our dataset. The y-axis represents density (e-3), and the x-axis shows the word count in logarithmic scale. Each category's distribution is independently normalized.
  • ...and 5 more figures