Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

Hyomin Kim; Sang-Yeon Hwang; Jaechang Lim; Yinhua Piao; Yunhak Oh; Woo Youn Kim; Chanyoung Park; Sungsoo Ahn; Junhyeok Jeon

Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

Hyomin Kim, Sang-Yeon Hwang, Jaechang Lim, Yinhua Piao, Yunhak Oh, Woo Youn Kim, Chanyoung Park, Sungsoo Ahn, Junhyeok Jeon

TL;DR

PBio-Agent introduces a multi-agent, progressive reasoning framework for predicting transcriptional responses to chemical perturbations in bulk-cell data. It pairs with the LincsQA benchmark, which uses consensus signatures to ground truth and tests biological validity across pharmacologically sensitive and insensitive contexts. The approach draws on specialized agents and knowledge graphs to decompose complex causal reasoning and uses difficulty-aware data sorting and iterative verification to improve robustness, achieving state-of-the-art results on LINCS-based perturbation datasets and PerturbQA with an 8B model. This work advances drug-perturbation reasoning by combining structured biological knowledge with agentic inference, suggesting scalable paths for mechanistic transcriptional predictions.

Abstract

Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training.

Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

TL;DR

Abstract

Paper Structure (43 sections, 6 equations, 4 figures, 9 tables)

This paper contains 43 sections, 6 equations, 4 figures, 9 tables.

Introduction
LincsQA benchmark
Dataset curation.
Dataset preparation.
Cell line selection.
Consensus signature construction.
Query gene selection.
Tasks
Gene-level regulation task.
MoA-level context task.
PBio-Agent
Problem formulation.
Difficulty-aware data sorting.
Progressive reasoning.
Multi-expert reasoning.
...and 28 more sections

Figures (4)

Figure 1: Overview of the LincsQA benchmark construction. (i) Quality control: Filtering LINCS L1000 Level 5 signatures for high-quality compound treatments. (ii) (b) Tier Selection: Hierarchical pairing of compounds to cell lines using a two-tier strategy. Tier 1 (clinical consensus) requires strict clinical indication alignment, where the compound's approved therapeutic use must match the cell line's disease origin. Tier 2 (mechanistic consensus) applies when no clinically matched cell line is available, selecting instead based on target biology and pathway activity relevant to the compound's mechanism, independent of clinical indication. (iii) Consensus signature: Extracting robust signals by enforcing directional consistency ($\geq 0.7$) and computing replicate-weighted consensus $z$-scores ($z_g$). (iv) Gene selection: Ranking and filtering genes by $z$-score magnitude and MoA-plausibility to form binary queries. (v) Output: Final benchmark comprising specific cell-compound contexts paired with high-confidence up- and down-regulated gene sets.
Figure 2: Overview of PBio-Agent.(a) Difficulty aware data sorting: We order data using a composite score derived from the product of two metrics. LLM self-consistency measures prediction stability over multiple trials. Biological relatedness of perturbation and gene is fetched from the STRING database. (b) Progressive reasoning:PBio-Agent processes genes from easy to hard to build iterative context. High confidence predictions and reasoning traces from earlier steps are propagated as supplementary information to guide the analysis of subsequent, more complex biological cases.
Figure 3: Agreement ratios and target (A375 cell line) rank comparison for BRAF inhibitors. Agreement ratios for vemurafenib (left) and dabrafenib (right) with target ranks (numbers above bars) showing A375's ranking among six cell lines. Only PBio-Agent-8B consistently achieves rank 1 in A375 (BRAF V600E-mutant), while baseline models show higher agreement in wild-type cell lines, demonstrating PBio-Agent-8B's ability to correctly prioritizes the mutation-harboring target (A375) cell line.
Figure 4: Agreement ratios of PBio-Agent across KRAS G12C-mutants with varying drug sensitivity. H358 (sensitive), H2122 (intermediate), and SW1573 (resistant) cells were treated with ARS-1620 and evaluated at 4h, 24h, and 72h. Higher agreement in sensitive H358 reflects coherent KRAS inhibition response, while lower agreement in resistant SW1573 indicates bypass pathway activation that decouples transcriptional changes from the annotated mechanism of action.

Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

TL;DR

Abstract

Progressive Multi-Agent Reasoning for Biological Perturbation Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (4)