Table of Contents
Fetching ...

Modular Multi-Task Learning for Chemical Reaction Prediction

Jiayun Pang, Ahmed M. Zaitoun, Xacobe Couso Cambeiro, Ivan Vulić

TL;DR

This work tackles the challenge of specializing large chemistry-oriented LLMs to limited, domain-specific reaction datasets without losing broad chemical knowledge. It advocates Low-Rank Adaptation (LoRA), a parameter-efficient modular fine-tuning approach, formalized by the update $\Delta W = BA$ and $W' = W + BA$, where $A \in \mathbb{R}^{r \times d}$ and $B \in \mathbb{R}^{k \times r}$, freezing the base model while learning compact adapters. Through extensive evaluation on USPTO_1K_TPL and a challenging C–H borylation dataset, the study shows LoRA achieves accuracy comparable to full fine-tuning across forward reaction prediction, retrosynthesis, and reagent prediction, while better preserving multi-task performance and mitigating catastrophic forgetting. The results also reveal that LoRA and full fine-tuning can yield different reactivity representations and solvent-generation capabilities, and that LoRA offers greater flexibility for modular deployment as LLMs scale. Overall, LoRA emerges as a practical, scalable strategy for chemistry applications, enabling task-specific specialization without compromising broader chemical understanding.

Abstract

Adapting large language models (LLMs) trained on broad organic chemistry to smaller, domain-specific reaction datasets is a key challenge in chemical and pharmaceutical R&D. Effective specialisation requires learning new reaction knowledge while preserving general chemical understanding across related tasks. Here, we evaluate Low-Rank Adaptation (LoRA) as a parameter-efficient alternative to full fine-tuning for organic reaction prediction on limited, complex datasets. Using USPTO reaction classes and challenging C-H functionalisation reactions, we benchmark forward reaction prediction, retrosynthesis and reagent prediction. LoRA achieves accuracy comparable to full fine-tuning while effectively mitigating catastrophic forgetting and better preserving multi-task performance. Both fine-tuning approaches generalise beyond training distributions, producing plausible alternative solvent predictions. Notably, C-H functionalisation fine-tuning reveals that LoRA and full fine-tuning encode subtly different reactivity patterns, suggesting more effective reaction-specific adaptation with LoRA. As LLMs continue to scale, our results highlight the practicality of modular, parameter-efficient fine-tuning strategies for their flexible deployment for chemistry applications.

Modular Multi-Task Learning for Chemical Reaction Prediction

TL;DR

This work tackles the challenge of specializing large chemistry-oriented LLMs to limited, domain-specific reaction datasets without losing broad chemical knowledge. It advocates Low-Rank Adaptation (LoRA), a parameter-efficient modular fine-tuning approach, formalized by the update and , where and , freezing the base model while learning compact adapters. Through extensive evaluation on USPTO_1K_TPL and a challenging C–H borylation dataset, the study shows LoRA achieves accuracy comparable to full fine-tuning across forward reaction prediction, retrosynthesis, and reagent prediction, while better preserving multi-task performance and mitigating catastrophic forgetting. The results also reveal that LoRA and full fine-tuning can yield different reactivity representations and solvent-generation capabilities, and that LoRA offers greater flexibility for modular deployment as LLMs scale. Overall, LoRA emerges as a practical, scalable strategy for chemistry applications, enabling task-specific specialization without compromising broader chemical understanding.

Abstract

Adapting large language models (LLMs) trained on broad organic chemistry to smaller, domain-specific reaction datasets is a key challenge in chemical and pharmaceutical R&D. Effective specialisation requires learning new reaction knowledge while preserving general chemical understanding across related tasks. Here, we evaluate Low-Rank Adaptation (LoRA) as a parameter-efficient alternative to full fine-tuning for organic reaction prediction on limited, complex datasets. Using USPTO reaction classes and challenging C-H functionalisation reactions, we benchmark forward reaction prediction, retrosynthesis and reagent prediction. LoRA achieves accuracy comparable to full fine-tuning while effectively mitigating catastrophic forgetting and better preserving multi-task performance. Both fine-tuning approaches generalise beyond training distributions, producing plausible alternative solvent predictions. Notably, C-H functionalisation fine-tuning reveals that LoRA and full fine-tuning encode subtly different reactivity patterns, suggesting more effective reaction-specific adaptation with LoRA. As LLMs continue to scale, our results highlight the practicality of modular, parameter-efficient fine-tuning strategies for their flexible deployment for chemistry applications.
Paper Structure (14 sections, 7 figures, 2 tables)

This paper contains 14 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The framework of our approach: general full fine-tuning from a pre-trained language model using the USPTO_1K_TPL dataset, followed by task-specific adaptation via either full fine-tuning and LoRA using a smaller and specific reaction dataset (e.g. C-H functionalisation). The general fine-tuning step has been reported in our previous work (in grey) while the current study focuses on the task-specific finetuning step. The model at each stage is multi-task in nature and can be fine-tuned either in a single task or multi-task fashion.
  • Figure 2: Results in the three evaluation tasks with three different models based on ByT5 small: the general fine-tuned model (without task-specific adaptation), full fine-tuning and LoRA with two sets of parameters. LoRA-small uses r=4 and $\alpha$=8 while LoRA-large uses r=16 and $\alpha$=32.
  • Figure 3: Results in tasks of reagents prediction and retrosynthesis with three different models based on nach0 base (top) and ByT5 base (bottom): the general fine-tuned model (without task-specific adaptation), full fine-tuning and LoRA with one or two sets of parameters. LoRA-small uses r=4 and $\alpha$=8 while LoRA-large uses r=16 and $\alpha$=32.
  • Figure 4: t-SNE visualisation of the embeddings of out-of-distribution reagents (yellow dots) from (A) the full finetuned model and (B) the LoRA model. Blue dots represent the embeddings of all reagents in the corresponding reaction class, while grey dots represent reagents only in the test split for that class. (C) Tanimoto similarity distribution for the out-of-distribution reagents proposed by the LoRA model (blue column) and the fully fine-tuned model (red column).
  • Figure 5: Comparison between the C-H borylation product prediction (Acc@1) using LoRA and full fine-tuning (ByT5 small). Incorrectly predicted activation positions are highlighted in lighter blue circle (LoRA) and green circle (full fine-tuning/FFT), while correctly predicted position are shown in solid circles. The nature of selectivity for each product is also given.
  • ...and 2 more figures