Table of Contents
Fetching ...

What an Autonomous Agent Discovers About Molecular Transformer Design: Does It Transfer?

Edward Wijaya

Abstract

Deep learning models for drug-like molecules and proteins overwhelmingly reuse transformer architectures designed for natural language, yet whether molecular sequences benefit from different designs has not been systematically tested. We deploy autonomous architecture search via an agent across three sequence types (SMILES, protein, and English text as control), running 3,106 experiments on a single GPU. For SMILES, architecture search is counterproductive: tuning learning rates and schedules alone outperforms the full search (p = 0.001). For natural language, architecture changes drive 81% of improvement (p = 0.009). Proteins fall between the two. Surprisingly, although the agent discovers distinct architectures per domain (p = 0.004), every innovation transfers across all three domains with <1% degradation, indicating that the differences reflect search-path dependence rather than fundamental biological requirements. We release a decision framework and open-source toolkit for molecular modeling teams to choose between autonomous architecture search and simple hyperparameter tuning.

What an Autonomous Agent Discovers About Molecular Transformer Design: Does It Transfer?

Abstract

Deep learning models for drug-like molecules and proteins overwhelmingly reuse transformer architectures designed for natural language, yet whether molecular sequences benefit from different designs has not been systematically tested. We deploy autonomous architecture search via an agent across three sequence types (SMILES, protein, and English text as control), running 3,106 experiments on a single GPU. For SMILES, architecture search is counterproductive: tuning learning rates and schedules alone outperforms the full search (p = 0.001). For natural language, architecture changes drive 81% of improvement (p = 0.009). Proteins fall between the two. Surprisingly, although the agent discovers distinct architectures per domain (p = 0.004), every innovation transfers across all three domains with <1% degradation, indicating that the differences reflect search-path dependence rather than fundamental biological requirements. We release a decision framework and open-source toolkit for molecular modeling teams to choose between autonomous architecture search and simple hyperparameter tuning.

Paper Structure

This paper contains 28 sections, 1 equation, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Decomposition of total improvement per track. On NLP, architecture search contributes the majority (81%) of improvement. On SMILES, HP tuning alone exceeds the total improvement (151%), and architecture search is counterproductive ($-$51%). Protein margins are too small for significance.
  • Figure 2: Best-so-far curves across conditions and tracks. Each line shows the mean cumulative minimum val_bpb over experiments 1--100, with shaded bands indicating the min--max range across runs. The HP-only agent (green) dominates on SMILES, the full agent (blue) dominates on NLP, and all conditions cluster tightly on protein.
  • Figure 3: Architecture clustering by domain ($p = 0.004$). (a) PCA projection of 13 best architectures, colored by track. (b) Pairwise Gower distance matrix ordered by track, showing within-track similarity (darker blocks on diagonal).
  • Figure 4: AUC-OC comparison across conditions and tracks. Lower is better. On SMILES, HP-only achieves the lowest AUC-OC; on NLP, the full agent is lowest. Protein conditions are tightly clustered with no significant differences.
  • Figure 5: Cumulative keep rate curves by condition. All conditions show declining keep rates as the search progresses and easy improvements are exhausted. The agent maintains a higher keep rate than random NAS across all tracks, reflecting more targeted proposals.
  • ...and 5 more figures