GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

Haoyang Liu; Shuyu Chen; Ye Zhang; Haohan Wang

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

Haoyang Liu, Shuyu Chen, Ye Zhang, Haohan Wang

TL;DR

GenoTEX proposes a standardized benchmark for automated gene expression data analysis to address GTA problems, unifying dataset selection, preprocessing, and statistical analysis in a pipeline aligned with computational genomics standards. It introduces GenoAgent, a multi-agent LLM baseline that emulates bioinformatician workflows via context-aware planning, code review, and domain-guided programming, and demonstrates promising end-to-end performance alongside clear gaps to human experts. The benchmark is built from expert-curated data and manual analyses, with rigorous evaluation across three tasks and multiple metrics, including AUROC and GSEA enrichment, highlighting both the feasibility and current limitations of AI-assisted genomics analysis. Overall, GenoTEX provides a comprehensive resource to benchmark, diagnose, and accelerate the development of automated gene expression data analysis methods, informing future directions in AI-driven genomics research.

Abstract

Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automated analysis of gene expression data. GenoTEX provides analysis code and results for solving a wide range of gene-trait association problems, encompassing dataset selection, preprocessing, and statistical analysis, in a pipeline that follows computational genomics standards. The benchmark includes expert-curated annotations from bioinformaticians to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgent, a team of LLM-based agents that adopt a multi-step programming workflow with flexible self-correction, to collaboratively analyze gene expression datasets. Our experiments demonstrate the potential of LLM-based methods in analyzing genomic data, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing automated methods for gene expression data analysis. The benchmark is available at https://github.com/Liu-Hy/GenoTEX.

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

TL;DR

Abstract

Paper Structure (69 sections, 8 equations, 5 figures, 5 tables)

This paper contains 69 sections, 8 equations, 5 figures, 5 tables.

Introduction
Related work
LLMs for collaborative problem-solving
LLMs for scientific discovery
Benchmark
Standardized pipeline of gene expression data analysis
Data preprocessing
Dataset filtering and selection
Gene expression data preprocessing
Trait data extraction
Data linking
Statistical analysis
Confounding factor correction
Incorporating conditions in regression
Benchmark creation
...and 54 more sections

Figures (5)

Figure 1: The overview of the GenoTEX benchmark curation, illustrating the standardized pipeline for analyzing gene expression datasets and the steps involved in creating the benchmark dataset.
Figure 1: Descriptive statistics of our GenoTex benchmark.
Figure 2: High-level schematic of our GEO data preprocessing pipeline, with example code of core components that omits techinical details.
Figure 3: The collaboration between Data Engineer and Code Reviewer.
Figure 4: The collaboration between Data Engineer and Domain Expert.

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

TL;DR

Abstract

GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (5)