Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design
Bruno Jacob, Khushbu Agarwal, Marcel Baer, Peter Rice, Simone Raugei
TL;DR
The paper introduces Genie-CAT, a domain-specific agentic LLM framework that combines retrieval-augmented reasoning, structural analysis, electrostatic calculations via PB/APBS, and a symmetry-aware redox predictor to generate mechanistically grounded hypotheses for metalloprotein design. Using ferredoxins, particularly a 1CLF ferredoxin with two [4Fe–4S] clusters, Genie-CAT demonstrates end-to-end multi-modal reasoning that ties sequence- and environment-level features to predicted redox shifts, achieving rapid hypothesis generation that aligns with expert intuition. The modular architecture supports extension to higher-fidelity simulations and broader cofactor spaces, with the potential to reduce design cycles from days to minutes while increasing interpretability and grounding. Overall, the work showcases how integrating literature grounding, structure-aware analysis, physics-based modeling, and targeted ML predictions in an agentic workflow can transform LLMs into productive partners for computational discovery in protein design.
Abstract
We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (RAG), structural parsing of Protein Data Bank files, electrostatic potential calculations, and machine-learning prediction of redox properties -- into a unified agentic workflow. By coupling natural-language reasoning with data-driven and physics-based computation, the system generates mechanistically interpretable, testable hypotheses linking sequence, structure, and function. In proof-of-concept demonstrations, Genie-CAT autonomously identifies residue-level modifications near [Fe--S] clusters that affect redox tuning, reproducing expert-derived hypotheses in a fraction of the time. The framework highlights how AI agents combining language models with domain-specific tools can bridge symbolic reasoning and numerical simulation, transforming LLMs from conversational assistants into partners for computational discovery.
