Table of Contents
Fetching ...

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang

TL;DR

PepThink-R1 addresses the challenge of designing cyclic peptides with multiple pharmacological properties by integrating large language models with explicit chain-of-thought supervised fine-tuning and reinforcement learning. The method centers on monomer-level reasoning during sequence generation, enabling interpretable edits and property-controlled optimization through a pharmacology-aware reward. A synthetic data pipeline builds reasoning-augmented peptide pairs, CoT prompts structure the reasoning, and GRPO-based RL optimizes for LogD, MRT, and SIF while maintaining chemical validity and diversity. Results show PepThink-R1 outperforms random mutation, standard SFT, and general LLMs in multi-property goals and interpretability, with case studies against PepINVENT illustrating broader exploration and stronger property gains. The work highlights a promising direction for transparent, LLM-guided peptide optimization, while noting limitations in QSAR-based evaluation and the need for real-world validation and expanded reasoning depth.

Abstract

Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitly reasons about monomer-level modifications during sequence generation, enabling interpretable design choices while optimizing for multiple pharmacological properties. Guided by a tailored reward function balancing chemical validity and property improvements, the model autonomously explores diverse sequence variants. We demonstrate that PepThink-R1 generates cyclic peptides with significantly enhanced lipophilicity, stability, and exposure, outperforming existing general LLMs (e.g., GPT-5) and domain-specific baseline in both optimization success and interpretability. To our knowledge, this is the first LLM-based peptide design framework that combines explicit reasoning with RL-driven property control, marking a step toward reliable and transparent peptide optimization for therapeutic discovery.

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

TL;DR

PepThink-R1 addresses the challenge of designing cyclic peptides with multiple pharmacological properties by integrating large language models with explicit chain-of-thought supervised fine-tuning and reinforcement learning. The method centers on monomer-level reasoning during sequence generation, enabling interpretable edits and property-controlled optimization through a pharmacology-aware reward. A synthetic data pipeline builds reasoning-augmented peptide pairs, CoT prompts structure the reasoning, and GRPO-based RL optimizes for LogD, MRT, and SIF while maintaining chemical validity and diversity. Results show PepThink-R1 outperforms random mutation, standard SFT, and general LLMs in multi-property goals and interpretability, with case studies against PepINVENT illustrating broader exploration and stronger property gains. The work highlights a promising direction for transparent, LLM-guided peptide optimization, while noting limitations in QSAR-based evaluation and the need for real-world validation and expanded reasoning depth.

Abstract

Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitly reasons about monomer-level modifications during sequence generation, enabling interpretable design choices while optimizing for multiple pharmacological properties. Guided by a tailored reward function balancing chemical validity and property improvements, the model autonomously explores diverse sequence variants. We demonstrate that PepThink-R1 generates cyclic peptides with significantly enhanced lipophilicity, stability, and exposure, outperforming existing general LLMs (e.g., GPT-5) and domain-specific baseline in both optimization success and interpretability. To our knowledge, this is the first LLM-based peptide design framework that combines explicit reasoning with RL-driven property control, marking a step toward reliable and transparent peptide optimization for therapeutic discovery.

Paper Structure

This paper contains 40 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of our approach. Peptide pairs were constructed from single position mutation of raw data with property values predicted by QSAR model. Those pairs were used for CoT SFT of pretrained LLM, which was further improved by RL and used for peptide optimization.
  • Figure 2: Comparison of LogD of peptides generated by random mutation, CoT-SFT model, and PepThink-R1. The upper panel shows the transition heat-map of LogD buckets from the original peptides to the generated ones. The bottom panel shows the distribution of LogD values from the three generation methods. Compared to random mutation and CoT-SFT model, PepThink-R1 demonstrates enrichment of higher LogD values, reflecting successful optimization. In the titles of the heatmap, 'SFT generated' denotes results from CoT-SFT model; 'RL generated' denotes results from PepThink-R1.
  • Figure 3: Chemical structures and property values of seed peptides, and peptides generated by PepThink-R1 and PepINVENT. Two representative cases are shown, each with three structures: the original peptide, the peptide generated by PepThink-R1, and the peptide generated by PepINVENT. Structural difference is highlighted in gray, blue, and green, respectively. The structural differences illustrate how PepThink-R1 designs differ from both the original and PepINVENT results across the two cases.
  • Figure 4: Structures and SMILES strings of the proposed monomer and its potential source monomers in the monomer database. The proposed monomer (b) is composed of two parts that both exist in the monomer database (highlighted in cyan and red separately both in the structure and SMILES string) - the cyan part exists in monomer (a) 7O9, and the red part exists in monomer (c) FN7.
  • Figure 5: Comparison of MRT of peptides generated by random mutation, our SFT model, and PepThink-R1.
  • ...and 2 more figures