Test-Time Training Scaling Laws for Chemical Exploration in Drug Design
Morgan Thomas, Albert Bou, Gianni De Fabritiis
TL;DR
The paper tackles the challenge of thoroughly exploring vast drug-like chemical space with Chemical Language Models (CLMs) trained via reinforcement learning, where mode collapse can hinder exploration. It introduces MolExp, a benchmark that requires rediscovery of structurally diverse molecules with similar bioactivity and demonstrates that test-time training scaling—via increasing the number of independent RL agents—produces a log-linear gain in exploration efficiency. Cooperative multi-agent RL strategies show limited improvements for targeted exploration and often trade off diversity for performance. Together, these findings establish MolExp as a practical framework for scalable, exploration-focused AI-driven drug discovery and highlight population-based TTT as a promising path to comprehensive chemical-space exploration.
Abstract
Chemical Language Models (CLMs) leveraging reinforcement learning (RL) have shown promise in de novo molecular design, yet often suffer from mode collapse, limiting their exploration capabilities. Inspired by Test-Time Training (TTT) in large language models, we propose scaling TTT for CLMs to enhance chemical space exploration. We introduce MolExp, a novel benchmark emphasizing the discovery of structurally diverse molecules with similar bioactivity, simulating real-world drug design challenges. Our results demonstrate that scaling TTT by increasing the number of independent RL agents follows a log-linear scaling law, significantly improving exploration efficiency as measured by MolExp. In contrast, increasing TTT training time yields diminishing returns, even with exploration bonuses. We further evaluate cooperative RL strategies to enhance exploration efficiency. These findings provide a scalable framework for generative molecular design, offering insights into optimizing AI-driven drug discovery.
