Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Zhenchao Tang, Fang Wang, Haohuai He, Jiale Zhou, Tianxu Lv, Jun Zhu, Shouzhi Chen, Minghao Yang, Yu Wang, Jiayang Wu, Yidong Song, Jianhua Yao
TL;DR
This work tackles the challenge of aligning LLMs with sparse biomedical knowledge by introducing Balanced Fine-Tuning (BFT), a lightweight post-training method that avoids costly reinforcement learning. BFT adds two adaptive weighting layers: token-level confidence to stabilize gradients and sample-level weighting based on the minimum group confidence to emphasize difficult spans, formalized as L_BFT(θ) = (1/B) ∑_{b=1}^B s_b ( ∑_t m_{b,t} w_{b,t} l_{b,t} ) / ( ∑_t m_{b,t} + ε ), with c_{b,t} and p_b^{conf} guiding the weights. Empirically, BFT improves medical reasoning, reduces forgetting on general-domain benchmarks, and yields biologically meaningful representations, enabling downstream tasks such as gene interaction prediction and single-cell perturbation response forecasting, while outperforming baselines like GeneAgent in biology without external APIs. The results suggest that BFT generalizes beyond domain-specific tasks by embedding domain knowledge into the LLM’s representations through adaptive learning from biomedical data, offering a practical RL-free pathway to integrated biomedical reasoning. Overall, BFT provides a general, scalable framework to augment LLMs with structured biomedical knowledge, with broad implications for biomedical research and AI-assisted life sciences.
Abstract
Effective post-training is essential to align Large Language Models (LLMs) with specialized biomedical knowledge to accelerate life science research. However, current approaches face significant limitations. First, biomedical reasoning involves intricate mechanisms often represented by sparse textual data. Standard Supervised Fine-Tuning (SFT) tends to overfit to surface-level instruction patterns without effectively internalizing this fragmented scientific knowledge. Second, Reinforcement Learning (RL) is impractical for this domain, as defining meaningful rewards often necessitates prohibitive experimental validation (e.g., wet-lab verification of drug responses), rendering real-time feedback unfeasible. We propose Balanced Fine-Tuning (BFT), an efficient post-training method designed to learn complex reasoning from sparse data without external reward signals. BFT operates through a two-layer weighting mechanism: 1. At the token level, it scales loss via prediction probabilities to stabilize gradients and prevent overfitting; 2. At the sample level, it uses "minimum group confidence" to adaptively enhance the learning of hard samples. Experiments demonstrate that BFT significantly outperforms SFT. In medical tasks, it enables LLMs to acquire knowledge that SFT misses. In biological tasks, BFT-based LLMs surpass GeneAgent (an accurate agent for biology analysis) in biological process reasoning. Moreover, the text embeddings generated by BFT can be directly applied to downstream tasks, such as gene interaction and single-cell perturbation response prediction. These results indicate that BFT facilitates broad applications of LLMs in biomedical research.
