MDCrow: Automating Molecular Dynamics Workflows with Large Language Models
Quintina Campbell, Sam Cox, Jorge Medina, Brittany Watterson, Andrew D. White
TL;DR
MDCrow contributes an agentic LLM framework to automate Molecular Dynamics workflows by composing an environment of domain-specific tools for information retrieval, PDB handling, simulation setup, and analysis. Through chain-of-thought reasoning and tool use within a LangChain/ReAct paradigm, MDCrow demonstrates substantial task completion and robustness across 25 prompts, with gpt-4o and llama-405b performing best. The study shows MDCrow can outperform baselines and even extrapolate to tasks outside its explicit toolset via interactive chatting, signaling a step toward scalable, automated MD research pipelines. The work provides open-source code and emphasizes careful evaluation and potential future enhancements as LLM capabilities advance.
Abstract
Molecular dynamics (MD) simulations are essential for understanding biomolecular systems but remain challenging to automate. Recent advances in large language models (LLM) have demonstrated success in automating complex scientific tasks using LLM-based agents. In this paper, we introduce MDCrow, an agentic LLM assistant capable of automating MD workflows. MDCrow uses chain-of-thought over 40 expert-designed tools for handling and processing files, setting up simulations, analyzing the simulation outputs, and retrieving relevant information from literature and databases. We assess MDCrow's performance across 25 tasks of varying required subtasks and difficulty, and we evaluate the agent's robustness to both difficulty and prompt style. \texttt{gpt-4o} is able to complete complex tasks with low variance, followed closely by \texttt{llama3-405b}, a compelling open-source model. While prompt style does not influence the best models' performance, it has significant effects on smaller models.
