Table of Contents
Fetching ...

Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning

Jiangjie Qiu, Hou Hei Lam, Xiuyuan Hu, Wentao Li, Siwei Fu, Fankun Zeng, Hao Zhang, Xiaonan Wang

TL;DR

This work tackles the challenge of discovering high-efficiency organic photovoltaics by integrating large-scale graph neural network pretraining with a GPT-2–based reinforcement learning framework to design donor–acceptor pairs with high PCE. Key innovations include pretraining on masking/reconstruction and HOMO/LUMO prediction to obtain robust embeddings, and a cross-attention–augmented predictor guiding a generation loop that optimizes candidate D–A pairs via a physics-informed reward. The approach yields predicted PCE values approaching $21\%$ for generated designs and provides initial fragment-level insights into motifs associated with high performance, while planning to release the largest open-source OPV dataset to accelerate community-wide discovery. If validated experimentally, this pipeline could substantially shorten the discovery cycle for high-performance OPV materials and enable more targeted experimental exploration.

Abstract

Organic photovoltaic (OPV) materials offer a promising avenue toward cost-effective solar energy utilization. However, optimizing donor-acceptor (D-A) combinations to achieve high power conversion efficiency (PCE) remains a significant challenge. In this work, we propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2 (Generative Pretrained Transformer 2)-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE. This approach produces candidate molecules with predicted efficiencies approaching 21\%, although further experimental validation is required. Moreover, we conducted a preliminary fragment-level analysis to identify structural motifs recognized by the RL model that may contribute to enhanced PCE, thus providing design guidelines for the broader research community. To facilitate continued discovery, we are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs. Finally, we discuss plans to collaborate with experimental teams on synthesizing and characterizing AI-designed molecules, which will provide new data to refine and improve our predictive and generative models.

Accelerating High-Efficiency Organic Photovoltaic Discovery via Pretrained Graph Neural Networks and Generative Reinforcement Learning

TL;DR

This work tackles the challenge of discovering high-efficiency organic photovoltaics by integrating large-scale graph neural network pretraining with a GPT-2–based reinforcement learning framework to design donor–acceptor pairs with high PCE. Key innovations include pretraining on masking/reconstruction and HOMO/LUMO prediction to obtain robust embeddings, and a cross-attention–augmented predictor guiding a generation loop that optimizes candidate D–A pairs via a physics-informed reward. The approach yields predicted PCE values approaching for generated designs and provides initial fragment-level insights into motifs associated with high performance, while planning to release the largest open-source OPV dataset to accelerate community-wide discovery. If validated experimentally, this pipeline could substantially shorten the discovery cycle for high-performance OPV materials and enable more targeted experimental exploration.

Abstract

Organic photovoltaic (OPV) materials offer a promising avenue toward cost-effective solar energy utilization. However, optimizing donor-acceptor (D-A) combinations to achieve high power conversion efficiency (PCE) remains a significant challenge. In this work, we propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2 (Generative Pretrained Transformer 2)-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE. This approach produces candidate molecules with predicted efficiencies approaching 21\%, although further experimental validation is required. Moreover, we conducted a preliminary fragment-level analysis to identify structural motifs recognized by the RL model that may contribute to enhanced PCE, thus providing design guidelines for the broader research community. To facilitate continued discovery, we are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs. Finally, we discuss plans to collaborate with experimental teams on synthesizing and characterizing AI-designed molecules, which will provide new data to refine and improve our predictive and generative models.

Paper Structure

This paper contains 14 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: The Structure of the Pretrain Model and Predictor Model.
  • Figure 2: (a) The architecture of the GPT. (b) The Structure of the RL model.
  • Figure 3: The distribution of device parameters
  • Figure 4: Top-1 PCE Score Trend for Generated Donor-Acceptor Pairs
  • Figure 5: Some of the Top Acceptors designed for PBDB-TF