Table of Contents
Fetching ...

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

Haoyang Liu, Shuyu Chen, Ye Zhang, Haohan Wang

TL;DR

GenoTEX proposes a standardized benchmark for automated gene expression data analysis to address GTA problems, unifying dataset selection, preprocessing, and statistical analysis in a pipeline aligned with computational genomics standards. It introduces GenoAgent, a multi-agent LLM baseline that emulates bioinformatician workflows via context-aware planning, code review, and domain-guided programming, and demonstrates promising end-to-end performance alongside clear gaps to human experts. The benchmark is built from expert-curated data and manual analyses, with rigorous evaluation across three tasks and multiple metrics, including AUROC and GSEA enrichment, highlighting both the feasibility and current limitations of AI-assisted genomics analysis. Overall, GenoTEX provides a comprehensive resource to benchmark, diagnose, and accelerate the development of automated gene expression data analysis methods, informing future directions in AI-driven genomics research.

Abstract

Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automated analysis of gene expression data. GenoTEX provides analysis code and results for solving a wide range of gene-trait association problems, encompassing dataset selection, preprocessing, and statistical analysis, in a pipeline that follows computational genomics standards. The benchmark includes expert-curated annotations from bioinformaticians to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgent, a team of LLM-based agents that adopt a multi-step programming workflow with flexible self-correction, to collaboratively analyze gene expression datasets. Our experiments demonstrate the potential of LLM-based methods in analyzing genomic data, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing automated methods for gene expression data analysis. The benchmark is available at https://github.com/Liu-Hy/GenoTEX.

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

TL;DR

GenoTEX proposes a standardized benchmark for automated gene expression data analysis to address GTA problems, unifying dataset selection, preprocessing, and statistical analysis in a pipeline aligned with computational genomics standards. It introduces GenoAgent, a multi-agent LLM baseline that emulates bioinformatician workflows via context-aware planning, code review, and domain-guided programming, and demonstrates promising end-to-end performance alongside clear gaps to human experts. The benchmark is built from expert-curated data and manual analyses, with rigorous evaluation across three tasks and multiple metrics, including AUROC and GSEA enrichment, highlighting both the feasibility and current limitations of AI-assisted genomics analysis. Overall, GenoTEX provides a comprehensive resource to benchmark, diagnose, and accelerate the development of automated gene expression data analysis methods, informing future directions in AI-driven genomics research.

Abstract

Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automated analysis of gene expression data. GenoTEX provides analysis code and results for solving a wide range of gene-trait association problems, encompassing dataset selection, preprocessing, and statistical analysis, in a pipeline that follows computational genomics standards. The benchmark includes expert-curated annotations from bioinformaticians to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgent, a team of LLM-based agents that adopt a multi-step programming workflow with flexible self-correction, to collaboratively analyze gene expression datasets. Our experiments demonstrate the potential of LLM-based methods in analyzing genomic data, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing automated methods for gene expression data analysis. The benchmark is available at https://github.com/Liu-Hy/GenoTEX.
Paper Structure (69 sections, 8 equations, 5 figures, 5 tables)

This paper contains 69 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The overview of the GenoTEX benchmark curation, illustrating the standardized pipeline for analyzing gene expression datasets and the steps involved in creating the benchmark dataset.
  • Figure 1: Descriptive statistics of our GenoTex benchmark.
  • Figure 2: High-level schematic of our GEO data preprocessing pipeline, with example code of core components that omits techinical details.
  • Figure 3: The collaboration between Data Engineer and Code Reviewer.
  • Figure 4: The collaboration between Data Engineer and Domain Expert.