Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization
Xuefei, Wang, Kai A. Horstmann, Ethan Lin, Jonathan Chen, Alexander R. Farhang, Sophia Stiles, Atharva Sehgal, Jonathan Light, David Van Valen, Yisong Yue, Jennifer J. Sun
TL;DR
The paper tackles the persistent last-mile bottleneck in adapting production biomedical imaging tools to bespoke datasets. It demonstrates that a minimal Base Agent framework can consistently surpass expert-tuned baselines across Polaris, Cellpose, and MedSAM pipelines, with substantial reductions in adaptation time. Through a systematic analysis of the agent design space, it shows that more complex architectures do not universally improve performance and that task context matters for design choices. The authors provide a practical, open-source framework and validate real-world impact by deploying agent-generated functions into production, outlining a roadmap for scalable tool adaptation in biomedical imaging.
Abstract
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated datasets scientists often lack, while manual code adaptation costs scientists weeks to months of effort. We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design for this targeted task. We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. We open source our framework and validate our approach by deploying agent-generated functions into a production pipeline, demonstrating a clear pathway for real-world impact.
