Streamlining Biomedical Research with Specialized LLMs
Linqing Chen, Weilei Wang, Yubin Xia, Wentao Wu, Peng Xu, Zilong Bai, Jie Fang, Chaobo Xu, Ran Hu, Licong Xu, Haoran Hua, Jing Sun, Hanmeng Zhong, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yong Gu, Tao Shi, Chaochao Wang, Jianping Lu, Cheng Sun, Yixin Wang, Shengjie Yang, Yuancheng Li, Lu Jin, Lisha Zhang, Fu Bian, Zhongkai Ye, Lidong Pei, Changyang Tu
TL;DR
The paper tackles the need for domain-specific LLMs in biopharma by introducing PharmaGPT and the SynapseChat platform, a retrieval-augmented framework designed for real-time, context-aware biomedical research with multi-modal outputs. It presents a multi-channel retrieval system that fuses structured SQL data, unstructured text, and vectorized documents via nl2sql, BM25, and embedding-based techniques, feeding into a domain-tuned LLM core. Empirical results show PharmaGPT outperforms general-purpose models on domain tasks (e.g., NAPLEX) and, within SynapseChat, achieves state-of-the-art performance in domain-specific QA, translation, and information discrimination, while enabling mind-map visualizations and deep-dive workflows. The work demonstrates high practical impact for researchers and clinicians by enabling precise, cited answers with rich visuals, while also addressing ethics, privacy, and access considerations for responsible deployment in biomedicine.
Abstract
In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, images, tables, and other modalities. We demonstrate the system's capability to enhance response precision by leveraging a robust question-answering model, significantly improving the quality of dialogue generation. The system provides an accessible platform for real-time, high-fidelity interactions, allowing users to benefit from efficient human-computer interaction, precise retrieval, and simultaneous access to a wide range of literature and data. This dramatically improves the research efficiency of professionals in the biomedical and pharmaceutical domains and facilitates faster, more informed decision-making throughout the R\&D process. Furthermore, the system proposed in this paper is available at https://synapse-chat.patsnap.com.
