Table of Contents
Fetching ...

GeneSUM: Large Language Model-based Gene Summary Extraction

Zhijian Chen, Chuan Hu, Min Wu, Qingqing Long, Xuezhi Wang, Yuanchun Zhou, Meng Xiao

TL;DR

The paper tackles the bottleneck of rapidly accumulating gene literature and the complexity of gene functions that hinder timely knowledge extraction. It introduces GeneSUM, a two-stage approach that first retrieves relevant literature and removes redundant content, followed by fine-tuning a large language model to produce streamlined gene summaries. Experiments demonstrate that the LLM-driven integration of gene-specific information improves summary quality and supports more efficient decision-making in ongoing research. This approach offers a scalable solution for automating gene knowledge synthesis and enriching biomedical information resources.

Abstract

Emerging topics in biomedical research are continuously expanding, providing a wealth of information about genes and their function. This rapid proliferation of knowledge presents unprecedented opportunities for scientific discovery and formidable challenges for researchers striving to keep abreast of the latest advancements. One significant challenge is navigating the vast corpus of literature to extract vital gene-related information, a time-consuming and cumbersome task. To enhance the efficiency of this process, it is crucial to address several key challenges: (1) the overwhelming volume of literature, (2) the complexity of gene functions, and (3) the automated integration and generation. In response, we propose GeneSUM, a two-stage automated gene summary extractor utilizing a large language model (LLM). Our approach retrieves and eliminates redundancy of target gene literature and then fine-tunes the LLM to refine and streamline the summarization process. We conducted extensive experiments to validate the efficacy of our proposed framework. The results demonstrate that LLM significantly enhances the integration of gene-specific information, allowing more efficient decision-making in ongoing research.

GeneSUM: Large Language Model-based Gene Summary Extraction

TL;DR

The paper tackles the bottleneck of rapidly accumulating gene literature and the complexity of gene functions that hinder timely knowledge extraction. It introduces GeneSUM, a two-stage approach that first retrieves relevant literature and removes redundant content, followed by fine-tuning a large language model to produce streamlined gene summaries. Experiments demonstrate that the LLM-driven integration of gene-specific information improves summary quality and supports more efficient decision-making in ongoing research. This approach offers a scalable solution for automating gene knowledge synthesis and enriching biomedical information resources.

Abstract

Emerging topics in biomedical research are continuously expanding, providing a wealth of information about genes and their function. This rapid proliferation of knowledge presents unprecedented opportunities for scientific discovery and formidable challenges for researchers striving to keep abreast of the latest advancements. One significant challenge is navigating the vast corpus of literature to extract vital gene-related information, a time-consuming and cumbersome task. To enhance the efficiency of this process, it is crucial to address several key challenges: (1) the overwhelming volume of literature, (2) the complexity of gene functions, and (3) the automated integration and generation. In response, we propose GeneSUM, a two-stage automated gene summary extractor utilizing a large language model (LLM). Our approach retrieves and eliminates redundancy of target gene literature and then fine-tunes the LLM to refine and streamline the summarization process. We conducted extensive experiments to validate the efficacy of our proposed framework. The results demonstrate that LLM significantly enhances the integration of gene-specific information, allowing more efficient decision-making in ongoing research.

Paper Structure

This paper contains 1 section, 1 figure.

Table of Contents

  1. Introduction

Figures (1)

  • Figure 1: An overview of our framework.