AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation

Gihan Panapitiya; Emily Saldanha; Heather Job; Olivia Hess

AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation

Gihan Panapitiya, Emily Saldanha, Heather Job, Olivia Hess

TL;DR

AutoLabs introduces a self-correcting, multi-agent framework that translates natural-language experimental goals into executable chemical protocols with hardware-ready output. Its LangGraph-based supervisor and specialized sub-agents, combined with tool-calling and two self-check regimes, enable robust design, validation, and execution guidance across five benchmark experiments. A systematic ablation shows reasoning capacity as the key driver of quantitative accuracy, with near-expert procedural fidelity ($F1>0.89$) achieved when a multi-agent, fully reasoning configuration is used, especially with guided self-checks. The work highlights the practical value of modular agent architectures, self-correction loops, and human-in-the-loop collaboration, while outlining future directions such as SOP integration via retrieval-based methods and memory-enabled, agent-evolving systems to further reliability in autonomous laboratories.

Abstract

The automation of chemical research through self-driving laboratories (SDLs) promises to accelerate scientific discovery, yet the reliability and granular performance of the underlying AI agents remain critical, under-examined challenges. In this work, we introduce AutoLabs, a self-correcting, multi-agent architecture designed to autonomously translate natural-language instructions into executable protocols for a high-throughput liquid handler. The system engages users in dialogue, decomposes experimental goals into discrete tasks for specialized agents, performs tool-assisted stoichiometric calculations, and iteratively self-corrects its output before generating a hardware-ready file. We present a comprehensive evaluation framework featuring five benchmark experiments of increasing complexity, from simple sample preparation to multi-plate timed syntheses. Through a systematic ablation study of 20 agent configurations, we assess the impact of reasoning capacity, architectural design (single- vs. multi-agent), tool use, and self-correction mechanisms. Our results demonstrate that agent reasoning capacity is the most critical factor for success, reducing quantitative errors in chemical amounts (nRMSE) by over 85% in complex tasks. When combined with a multi-agent architecture and iterative self-correction, AutoLabs achieves near-expert procedural accuracy (F1-score > 0.89) on challenging multi-step syntheses. These findings establish a clear blueprint for developing robust and trustworthy AI partners for autonomous laboratories, highlighting the synergistic effects of modular design, advanced reasoning, and self-correction to ensure both performance and reliability in high-stakes scientific applications. Code: https://github.com/pnnl/autolabs

AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation

TL;DR

Abstract

AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)