Table of Contents
Fetching ...

NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery

Anurag J. Vaidya, Felix Meissen, Daniel C. Castro, Shruthi Bannur, Tristan Lazard, Drew F. K. Williamson, Faisal Mahmood, Javier Alvarez-Valle, Stephanie L. Hyland, Kenza Bouzid

TL;DR

NOVA presents a modular agentic framework that turns natural-language queries into executable histopathology analysis pipelines via a core LLM and 49 domain-specific tools, enabling scalable, dataset-level discovery without instruction-finetuned models. SlideQuest provides a rigorous 90-question benchmark spanning data, cellular, ROI, and gigapixel tasks, verified by pathologists and biomedical scientists to require multi-step reasoning and coding. Empirical results show NOVA outperforming coding baselines across categories, with a pathologist-verified case study linking morphological features to PAM50 subtypes, demonstrating practical discovery potential. The work highlights current tool and framework limitations and outlines future directions toward broader modalities, automated tool creation, and community-driven benchmark expansion.

Abstract

Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark -- verified by pathologists and biomedical scientists -- spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows NOVA outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its scalable discovery potential.

NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery

TL;DR

NOVA presents a modular agentic framework that turns natural-language queries into executable histopathology analysis pipelines via a core LLM and 49 domain-specific tools, enabling scalable, dataset-level discovery without instruction-finetuned models. SlideQuest provides a rigorous 90-question benchmark spanning data, cellular, ROI, and gigapixel tasks, verified by pathologists and biomedical scientists to require multi-step reasoning and coding. Empirical results show NOVA outperforming coding baselines across categories, with a pathologist-verified case study linking morphological features to PAM50 subtypes, demonstrating practical discovery potential. The work highlights current tool and framework limitations and outlines future directions toward broader modalities, automated tool creation, and community-driven benchmark expansion.

Abstract

Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark -- verified by pathologists and biomedical scientists -- spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows NOVA outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its scalable discovery potential.

Paper Structure

This paper contains 51 sections, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Nova framework. The system takes as input a user query about one or more histology images that are present on the file system. Using a collection of tools and in-built libraries, a core LLM generates Python code to conduct multi-step data processing and analysis towards answering the user query. Code is iteratively executed and fed back into the LLM context to enable dynamic and multi-stage action.
  • Figure 2: Overview of the SlideQuest benchmark. (A) The four benchmark categories. Listed examples are abridged for illustration only; see full exemplars in \ref{['apd:user-query-format']}. (B) Diversity of input and output types. Also note that DataQA and SlideQA contain WSI, whereas CellularQA and PatchQA operate on conventional flat images. (C) Themes of capabilities required to answer the questions (full break-down in \ref{['tab:capabilities']}).
  • Figure 3: A. Average score (higher is better) on SlideQuest stratified by benchmark category. B. Failure rate (lower is better) showing the proportion of questions from SlideQuest on which the approach achieved a zero score. Overall is the average of each category weighted by number of questions in the category. Error bars are standard error of the mean from 3 trials. All results with GPT-4.1. "PI" stands for Python interpreter.
  • Figure 4: Case study showing the use of Nova to explore the morphological features associated with PAM50 breast cancer subtypes (Luminal A, Luminal B, Basal-like, HER2-enriched) and assess their relationship with tumour characteristics. Only the main steps are shown for illustration purposes. The final report produced by Nova is shown in \ref{['apd:final_report_part1', 'apd:final_report_part2']}
  • Figure B.1: Final analysis markdown report (Part 1 of 2) produced by Nova for the exploration of the morphological features associated with molecular PAM50 breast cancer subtypes.
  • ...and 5 more figures