Table of Contents
Fetching ...

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

Jacopo Teneggi, S. M. Bargeen A. Turzo, Tanya Marwah, Alberto Bietti, P. Douglas Renfrew, Vikram Khipple Mulligan, Siavash Golkar

Abstract

Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent paired with a structured environment for operating Rosetta, the leading physics-based heteropolymer design software, capable of modeling non-canonical building blocks and geometries. Agent Rosetta iteratively refines designs to achieve user-defined objectives, combining LLM reasoning with Rosetta's generality. We evaluate Agent Rosetta on design with canonical amino acids, matching specialized models and expert baselines, and with non-canonical residues -- where ML approaches fail -- achieving comparable performance. Critically, prompt engineering alone often fails to generate Rosetta actions, demonstrating that environment design is essential for integrating LLM agents with specialized software. Our results show that properly designed environments enable LLM agents to make scientific software accessible while matching specialized tools and human experts.

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

Abstract

Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent paired with a structured environment for operating Rosetta, the leading physics-based heteropolymer design software, capable of modeling non-canonical building blocks and geometries. Agent Rosetta iteratively refines designs to achieve user-defined objectives, combining LLM reasoning with Rosetta's generality. We evaluate Agent Rosetta on design with canonical amino acids, matching specialized models and expert baselines, and with non-canonical residues -- where ML approaches fail -- achieving comparable performance. Critically, prompt engineering alone often fails to generate Rosetta actions, demonstrating that environment design is essential for integrating LLM agents with specialized software. Our results show that properly designed environments enable LLM agents to make scientific software accessible while matching specialized tools and human experts.
Paper Structure (51 sections, 16 figures, 4 tables)

This paper contains 51 sections, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Illustration of our multi-turn agentic system. (A) Schematics of Agent Rosetta's interaction protocol. (B) Design refinement: the agent chooses the action, and, after the environment returns the action documentation, it generates the action call with its parameters.
  • Figure 2: A failure example of prompting for generation of composition penalties. Even though Agent Rosetta wants to reduce proline content, the penalty block achieves the opposite effect.
  • Figure 3: Comparison of the performance of different LLMs at generating amino acid compositional penalty blocks with the original RosettaScripts syntax and our simplified syntax.
  • Figure 4: Comparison of best running RMSD as a function of design step for 2 target backbone conformations. On 9PL1, Agent Rosetta outperforms competing methods, and on 9C14 it helps close the gap between the human written protocols and ProteinMPNN.
  • Figure 5: Summary of results for stabilizing backbone conformations with canonical amino acids only. We report the average cost of one run of 30 model queries, the average action success rate, and the fraction of output tokens that were reasoning.
  • ...and 11 more figures