Table of Contents
Fetching ...

CancerLLM: A Large Language Model in Cancer Domain

Mingchen Li, Jiatan Huang, Jeremy Yeung, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang

TL;DR

CancerLLM addresses the lack of a cancer-domain LLM by introducing a 7B parameter model with a Mistral-style architecture, pre-trained on ~2.7M cancer notes and ~515K pathology reports across 17 cancer types, then fine-tuned for cancer phenotype extraction and cancer diagnosis generation. It achieves state-of-the-art performance on both tasks (F1 of 91.78% for phenotype extraction and 86.81% for diagnosis generation) while remaining more efficient than larger models; robustness and retrieval experiments further validate reliability and practical deployment potential. The work provides a dedicated cancer data benchmark, a transparent evaluation framework, and insights into how domain-focused pretraining, instruction tuning, and retrieval augmentation can yield strong performance with smaller models. Taken together, CancerLLM offers a scalable, clinically relevant tool to support oncology research and practice, enabling accurate extraction of cancer phenotypes and generation of diagnoses with reduced computational burden.

Abstract

Medical Large Language Models (LLMs) have demonstrated impressive performance on a wide variety of medical NLP tasks; however, there still lacks a LLM specifically designed for phenotyping identification and diagnosis in cancer domain. Moreover, these LLMs typically have several billions of parameters, making them computationally expensive for healthcare systems. Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types, followed by fine-tuning on two cancer-relevant tasks, including cancer phenotypes extraction and cancer diagnosis generation. Our evaluation demonstrated that the CancerLLM achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation. It outperformed existing LLMs, with an average F1 score improvement of 9.23%. Additionally, the CancerLLM demonstrated its efficiency on time and GPU usage, and robustness comparing with other LLMs. We demonstrated that CancerLLM can potentially provide an effective and robust solution to advance clinical research and practice in cancer domain

CancerLLM: A Large Language Model in Cancer Domain

TL;DR

CancerLLM addresses the lack of a cancer-domain LLM by introducing a 7B parameter model with a Mistral-style architecture, pre-trained on ~2.7M cancer notes and ~515K pathology reports across 17 cancer types, then fine-tuned for cancer phenotype extraction and cancer diagnosis generation. It achieves state-of-the-art performance on both tasks (F1 of 91.78% for phenotype extraction and 86.81% for diagnosis generation) while remaining more efficient than larger models; robustness and retrieval experiments further validate reliability and practical deployment potential. The work provides a dedicated cancer data benchmark, a transparent evaluation framework, and insights into how domain-focused pretraining, instruction tuning, and retrieval augmentation can yield strong performance with smaller models. Taken together, CancerLLM offers a scalable, clinically relevant tool to support oncology research and practice, enabling accurate extraction of cancer phenotypes and generation of diagnoses with reduced computational burden.

Abstract

Medical Large Language Models (LLMs) have demonstrated impressive performance on a wide variety of medical NLP tasks; however, there still lacks a LLM specifically designed for phenotyping identification and diagnosis in cancer domain. Moreover, these LLMs typically have several billions of parameters, making them computationally expensive for healthcare systems. Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types, followed by fine-tuning on two cancer-relevant tasks, including cancer phenotypes extraction and cancer diagnosis generation. Our evaluation demonstrated that the CancerLLM achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation. It outperformed existing LLMs, with an average F1 score improvement of 9.23%. Additionally, the CancerLLM demonstrated its efficiency on time and GPU usage, and robustness comparing with other LLMs. We demonstrated that CancerLLM can potentially provide an effective and robust solution to advance clinical research and practice in cancer domain
Paper Structure (34 sections, 2 figures, 9 tables)

This paper contains 34 sections, 2 figures, 9 tables.

Figures (2)

  • Figure 1: The evolution of medical LLM performance on cancer phenotype extraction, and diagnosis generation is measured using the average F1 score, which includes Exact Match, BLEU-2, and ROUGE-L. Our CancerLLM achieves the highest performance with an F1 score of 89.30%.
  • Figure 2: Overview of CancerLLM