NOVA: An Agentic Framework for Automated Histopathology Analysis and Discovery
Anurag J. Vaidya, Felix Meissen, Daniel C. Castro, Shruthi Bannur, Tristan Lazard, Drew F. K. Williamson, Faisal Mahmood, Javier Alvarez-Valle, Stephanie L. Hyland, Kenza Bouzid
TL;DR
NOVA presents a modular agentic framework that turns natural-language queries into executable histopathology analysis pipelines via a core LLM and 49 domain-specific tools, enabling scalable, dataset-level discovery without instruction-finetuned models. SlideQuest provides a rigorous 90-question benchmark spanning data, cellular, ROI, and gigapixel tasks, verified by pathologists and biomedical scientists to require multi-step reasoning and coding. Empirical results show NOVA outperforming coding baselines across categories, with a pathologist-verified case study linking morphological features to PAM50 subtypes, demonstrating practical discovery potential. The work highlights current tool and framework limitations and outlines future directions toward broader modalities, automated tool creation, and community-driven benchmark expansion.
Abstract
Digitized histopathology analysis involves complex, time-intensive workflows and specialized expertise, limiting its accessibility. We introduce NOVA, an agentic framework that translates scientific queries into executable analysis pipelines by iteratively generating and running Python code. NOVA integrates 49 domain-specific tools (e.g., nuclei segmentation, whole-slide encoding) built on open-source software, and can also create new tools ad hoc. To evaluate such systems, we present SlideQuest, a 90-question benchmark -- verified by pathologists and biomedical scientists -- spanning data processing, quantitative analysis, and hypothesis testing. Unlike prior biomedical benchmarks focused on knowledge recall or diagnostic QA, SlideQuest demands multi-step reasoning, iterative coding, and computational problem solving. Quantitative evaluation shows NOVA outperforms coding-agent baselines, and a pathologist-verified case study links morphology to prognostically relevant PAM50 subtypes, demonstrating its scalable discovery potential.
