Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design

Sakhinana Sagar Srinivas; Venkataramana Runkana

Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design

Sakhinana Sagar Srinivas, Venkataramana Runkana

TL;DR

This work introduces FrontierX: LLM-MG, a knowledge-augmented prompting framework for zero-shot text-to-molecule generation that combines outputs from off-the-shelf LLMs with domain-tuned small LMs. Top-$R$ SMILES and explanatory rationales produced by LLMs are integrated via a two-layer hierarchical multi-head attention mechanism to form cross-modal embeddings that drive a Transformer decoder to generate accurate SMILES strings. The approach achieves state-of-the-art performance on the text2mol task (ChEBI-20) and demonstrates the value of explicit explanations and cross-modal fusion, with ablations confirming the contribution of each component. The methodology enables efficient, scalable molecule design without fine-tuning large models, highlighting a practical pathway for knowledge-infused prompting in cross-domain generative chemistry.

Abstract

Molecule design is a multifaceted approach that leverages computational methods and experiments to optimize molecular properties, fast-tracking new drug discoveries, innovative material development, and more efficient chemical processes. Recently, text-based molecule design has emerged, inspired by next-generation AI tasks analogous to foundational vision-language models. Our study explores the use of knowledge-augmented prompting of large language models (LLMs) for the zero-shot text-conditional de novo molecular generation task. Our approach uses task-specific instructions and a few demonstrations to address distributional shift challenges when constructing augmented prompts for querying LLMs to generate molecules consistent with technical descriptions. Our framework proves effective, outperforming state-of-the-art (SOTA) baseline models on benchmark datasets.

Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design

TL;DR

SMILES and explanatory rationales produced by LLMs are integrated via a two-layer hierarchical multi-head attention mechanism to form cross-modal embeddings that drive a Transformer decoder to generate accurate SMILES strings. The approach achieves state-of-the-art performance on the text2mol task (ChEBI-20) and demonstrates the value of explicit explanations and cross-modal fusion, with ablations confirming the contribution of each component. The methodology enables efficient, scalable molecule design without fine-tuning large models, highlighting a practical pathway for knowledge-infused prompting in cross-domain generative chemistry.

Abstract

Paper Structure (25 sections, 15 equations, 2 figures, 10 tables)

This paper contains 25 sections, 15 equations, 2 figures, 10 tables.

Introduction
Proposed Method
Evaluation LLMs & LMs:
Knowledge-Augmented Prompts:
Querying LLMs:
Fine-tuning LMs for Domain-Specific Customization:
LLMs Prediction Embeddings:
Cross-modal Attention Layer
Output Layer:
Experiments & Results
Datasets & Baselines
Evaluation Metrics
Experimental Setup
Results
Ablation Studies
...and 10 more sections

Figures (2)

Figure 1: Overview of the FrontierX: LLM-MG framework. We construct knowledge-augmented prompts using task-specific instructions and a few demonstrations (input-output pairs) based on the downstream task. The augmented prompt queries LLMs to generate the top-$R$ predictions of the SMILES representations and produces textual explanations as justifications for its predictions. We fine-tune small-scale pre-trained language models (LMs) on the generated explanations for domain-specific customization to obtain context-aware token embeddings. We utilize a weighted-sum pooling attention mechanism for task-specific adaptation to compute contextualized text-level embeddings. In parallel, we transform the LLMs' top-$R$ predictions to compute prediction embeddings. The cross-modal encoder, modeled by a hierarchical multi-head attention mechanism, computes the unified embeddings by integrating the mono-domain text-level embeddings (both the original text and explanatory text) and prediction embeddings. Finally, the transformer decoder generates the chemical SMILES representations. We do not repurpose LLMs by fine-tuning with labeled data for domain customization. Instead, we access LLMs via LMaaSsun2022black using text-based API interaction.
Figure 2: Overview of FrontierX: LLM-MG framework for mol2text task.

Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design

TL;DR

Abstract

Crossing New Frontiers: Knowledge-Augmented Large Language Model Prompting for Zero-Shot Text-Based De Novo Molecule Design

Authors

TL;DR

Abstract

Table of Contents

Figures (2)