Table of Contents
Fetching ...

Artificial Intelligence Driven Workflow for Accelerating Design of Novel Photosensitizers

Hongyi Wang, Xiuli Zheng, Weimin Liu, Zitian Tang, Sheng Gong

TL;DR

This work addresses the slow pace of photosensitizer discovery by introducing AAPSI, an AI-driven closed-loop workflow that combines scaffold-based molecule generation, graph-transformer-based property prediction, and multi-objective Bayesian optimization. Leveraging a solvent-aware database of over 100k PS-solvent pairs, the authors generate thousands of candidates and experimentally validate top hits, notably HB4Ph with $\\phi_\\Delta$=0.85 and $\\lambda_{max}$=645 nm, placing it at the Pareto frontier for PDT-relevant properties. Key contributions include a large PS-solvent database, a scaffold-guided generative model (MoLeR), a predictive model (SolutionNet) with uncertainty quantification, and MOBO-guided generation that yields synthetically accessible, high-performance candidates. The results demonstrate that AI-guided design can rapidly identify PDT-optimized photosensitizers and provide a practical pathway toward closed-loop discovery in materials science, with a public database and synthesized molecules illustrating real-world impact.

Abstract

The discovery of high-performance photosensitizers has long been hindered by the time-consuming and resource-intensive nature of traditional trial-and-error approaches. Here, we present \textbf{A}I-\textbf{A}ccelerated \textbf{P}hoto\textbf{S}ensitizer \textbf{I}nnovation (AAPSI), a closed-loop workflow that integrates expert knowledge, scaffold-based molecule generation, and Bayesian optimization to accelerate the design of novel photosensitizers. The scaffold-driven generation in AAPSI ensures structural novelty and synthetic feasibility, while the iterative AI-experiment loop accelerates the discovery of novel photosensitizers. AAPSI leverages a curated database of 102,534 photosensitizer-solvent pairs and generate 6,148 synthetically accessible candidates. These candidates are screened via graph transformers trained to predict singlet oxygen quantum yield ($φ_Δ$) and absorption maxima ($λ_{max}$), following experimental validation. This work generates several novel candidates for photodynamic therapy (PDT), among which the hypocrellin-based candidate HB4Ph exhibits exceptional performance at the Pareto frontier of high quantum yield of singlet oxygen and long absorption maxima among current photosensitizers ($φ_Δ$=0.85, $λ_{max}$=650nm).

Artificial Intelligence Driven Workflow for Accelerating Design of Novel Photosensitizers

TL;DR

This work addresses the slow pace of photosensitizer discovery by introducing AAPSI, an AI-driven closed-loop workflow that combines scaffold-based molecule generation, graph-transformer-based property prediction, and multi-objective Bayesian optimization. Leveraging a solvent-aware database of over 100k PS-solvent pairs, the authors generate thousands of candidates and experimentally validate top hits, notably HB4Ph with =0.85 and =645 nm, placing it at the Pareto frontier for PDT-relevant properties. Key contributions include a large PS-solvent database, a scaffold-guided generative model (MoLeR), a predictive model (SolutionNet) with uncertainty quantification, and MOBO-guided generation that yields synthetically accessible, high-performance candidates. The results demonstrate that AI-guided design can rapidly identify PDT-optimized photosensitizers and provide a practical pathway toward closed-loop discovery in materials science, with a public database and synthesized molecules illustrating real-world impact.

Abstract

The discovery of high-performance photosensitizers has long been hindered by the time-consuming and resource-intensive nature of traditional trial-and-error approaches. Here, we present \textbf{A}I-\textbf{A}ccelerated \textbf{P}hoto\textbf{S}ensitizer \textbf{I}nnovation (AAPSI), a closed-loop workflow that integrates expert knowledge, scaffold-based molecule generation, and Bayesian optimization to accelerate the design of novel photosensitizers. The scaffold-driven generation in AAPSI ensures structural novelty and synthetic feasibility, while the iterative AI-experiment loop accelerates the discovery of novel photosensitizers. AAPSI leverages a curated database of 102,534 photosensitizer-solvent pairs and generate 6,148 synthetically accessible candidates. These candidates are screened via graph transformers trained to predict singlet oxygen quantum yield () and absorption maxima (), following experimental validation. This work generates several novel candidates for photodynamic therapy (PDT), among which the hypocrellin-based candidate HB4Ph exhibits exceptional performance at the Pareto frontier of high quantum yield of singlet oxygen and long absorption maxima among current photosensitizers (=0.85, =650nm).

Paper Structure

This paper contains 41 sections, 8 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: This figure illustrates the AAPSI workflow. a: We collect a photosensitizer database that includes published and unpublished data. b: Key elements of this workflow is derived from the database. A set of scaffold structures are selected and modified from the molecules in the database to incorporate expert knowledge and ensure the validity of the generated structures. We finetune a MoLeR model for molecule generation from a pretrained checkpoint and train a SolutionNet model for property prediction from scratch. c: Unified with MOBO, we use the scaffold pool, the MoLeR model, and the SolutionNet model to iteratively generate molecules. After the generation, the Pareto frontier of high $\phi_\Delta$ and long wavelength of $\lambda_{max}$ is identified. d: We manually screen the generations and select some synthetically accessible molecules as the candidate molecules. e: Three of the candidate molecules as selected for further investigations, including TD-DFT calculations, synthesis, and characteristics. Real-world data is subsequently added to the database. f: Among the synthesized molecules, HB4Ph emerged as the first experimentally validated, AI-designed photosensitizer demonstrating high PDT potential, with a $\phi_\Delta$ of 0.85 and an $\lambda_{max}$ at 645 nm. Compared with clinical-used and under-trial drugs, HB4Ph is at the Pareto frontier of the two properties. g: A demonstration of PDT, the subsequent application of the photosensitizers.
  • Figure 2: a: An overview of the generation process. The Bayesian optimizer samples a scaffold and a hidden bias representation. The generative model encodes the scaffold into another hidden representation, which is added with the bias representation, then decodes the hidden representation into a new molecule. The predictive model screens the new molecule and predicts the target properties, which is applied to the posterior probability calculation in the next cycle. b: Statistics of dataset all_ps on molecular weight. c: Range and distribution of labels of other datasets. d: Demonstration of generated results from direct generation and Bayesian multi-objective optimization, with the Pareto frontier of $\phi_\Delta$ and $\lambda_{max}$ highlighted.
  • Figure 3: a: Molecule structure and PDT related physical properties of HB, PNBD, HBS2N, and HB4Ph (in the middle of subfigure f). Some characterization of the four molecules are illustrated in this figure, including b: absorption spectra; c: emission spectra; d: ROS detection; e: singlet oxygen detection. f: The comparison of HB4Ph with clinical-used and under-trial photosensitizers for PDT. HB4Ph emerges at the Pareto frontier of $\lambda_{max}$ and $\phi_\Delta$.
  • Figure 4: The scaffold where molecule generation start from. Molecule 1 represents HB the natural product, molecule 2 represents HB4 in section \ref{['syn_hb4']}.
  • Figure 5: a: Overall structure of SolutionNet. The SolutionNet consists of two GT blocks and a FFNN block. The Molecule graph of photosensitizers ($\mathcal{G}_{ps}$) and solvents ($\mathcal{G}_{solv}$) are taken as input, the target property is predicted as output. Atom (node) features in $\mathcal{G}$ are encoded as $x_i$ for node $i$, while bond (edge) features between nodes $j$ and $i$ are denoted as $e_{ji}$. Positional embeddings, capturing spatial or topological information of atom $i$, are represented as $p_i$. This framework integrates structural and relational features across both graphs to model properties of photosensitizer-solvent pairs. b: Illustration of the structure of the generative model in this work. The framework is adopted from MoLeR maziarz2021moler, along with the checkpoint of the encoder-decoder. During the Bayesian optimization, the scaffold and the bias embedding is sampled by the optimizer before feeding to the decoder.
  • ...and 11 more figures