VoxelPrompt: A Vision Agent for End-to-End Medical Image Analysis
Andrew Hoopes, Neel Dey, Victor Ion Butoi, John V. Guttag, Adrian V. Dalca
TL;DR
VoxelPrompt tackles the challenge of flexible, end-to-end radiology workflows by jointly training a language-model agent with a vision network to generate and execute executable analysis pipelines from natural language prompts. The system operates on native-resolution 3D volumes, using cross-volume attention and a persistent execution environment to produce segmentations, measurements, and language explanations across multi-acquisition studies. Key contributions include a unified framework that matches or exceeds single-task specialist baselines on diverse brain-imaging tasks, significant efficiency gains from native-resolution processing, and robust performance under varying acquisition types and data quality. The approach offers transparent, programmable workflows that can be integrated into clinical pipelines, enabling broader, open-ended biomedical analyses with AI assistance.
Abstract
We present VoxelPrompt, an end-to-end image analysis agent that tackles free-form radiological tasks. Given any number of volumetric medical images and a natural language prompt, VoxelPrompt integrates a language model that generates executable code to invoke a jointly-trained, adaptable vision network. This code further carries out analytical steps to address practical quantitative aims, such as measuring the growth of a tumor across visits. The pipelines generated by VoxelPrompt automate analyses that currently require practitioners to painstakingly combine multiple specialized vision and statistical tools. We evaluate VoxelPrompt using diverse neuroimaging tasks and show that it can delineate hundreds of anatomical and pathological features, measure complex morphological properties, and perform open-language analysis of lesion characteristics. VoxelPrompt performs these objectives with an accuracy similar to that of specialist single-task models for image analysis, while facilitating a broad range of compositional biomedical workflows.
