SeqProFT: Sequence-only Protein Property Prediction with LoRA Finetuning
Shuo Zhang, Jian K. Liu
TL;DR
SeqProFT demonstrates that parameter-efficient finetuning via LoRA on protein language models enables smaller models to match or outperform larger ones on diverse protein-property tasks using only sequence data. By integrating a contact-map–augmented attention head, the approach leverages partial structural information to boost classification accuracy, while maintaining low training costs and fast inference. Comprehensive ablations show robustness across ranks and model sizes, with attention analyses indicating LoRA guides the model toward biologically meaningful sequence features. The work provides a practical blueprint for deploying efficient sequence-only predictions in resource-constrained settings and offers interpretability insights valuable for applications like drug discovery.
Abstract
Protein language models (PLMs) have demonstrated remarkable capabilities in learning relationships between protein sequences and functions. However, finetuning these large models requires substantial computational resources, often with suboptimal task-specific results. This study investigates how parameter-efficient finetuning via LoRA can enhance protein property prediction while significantly reducing computational demands. By applying LoRA to ESM-2 and ESM-C models of varying sizes and evaluating 10 diverse protein property prediction tasks, we demonstrate that smaller models with LoRA adaptation can match or exceed the performance of larger models without adaptation. Additionally, we integrate contact map information through a multi-head attention mechanism, improving model comprehension of structural features. Our systematic analysis reveals that LoRA finetuning enables faster convergence, better performance, and more efficient resource utilization, providing practical guidance for protein research applications in resource-constrained environments. The code is available at https://github.com/jiankliu/SeqProFT.
