Table of Contents
Fetching ...

Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam

Abstract

Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.

Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

Abstract

Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.
Paper Structure (15 sections, 2 equations, 6 figures, 6 tables)

This paper contains 15 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An overview of our proposed method TreeKD, consisting of the following four steps: ① Extract FGs, ② Construct Specialist Models, ③ Distill Knowledge from Specialist Models, and ④ Test-Time Scaling.
  • Figure 2: An example of verbalizing a predictive rule. The tree is turned into a hierarchy of if-then conditions, where indentation indicates the depth of nodes.
  • Figure 3: An example of a completed prompt used by the proposed method for training LLMs on Caco-2 Permeability. The description of the property is directly adopted from the TDC benchmark’s site.
  • Figure 4: The arrangements of 1457 molecules in the testing set according to whether a molecule is predicted accurately.
  • Figure 5: The arrangements of 30 pairs of molecules exhibiting property cliffs in the testing set according to whether both molecules in each pair are predicted accurately.
  • ...and 1 more figures