Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design
Yannick Vogt, Mehdi Naouar, Maria Kalweit, Christoph Cornelius Miething, Justus Duyster, Roland Mertelsmann, Gabriel Kalweit, Joschka Boedecker
TL;DR
The paper tackles the antibody design problem for the CDRH3 region under a massive search space of $20^L$ sequences with $L=11$, proposing a reinforcement learning framework usable in both online and offline settings. It introduces an offline-capable, stable RL approach that combines Maxmin ensembles and an attention-based Q-network, along with Fitness Buffer replay and nonlinear reward scaling to address overestimation and epistasis. The method achieves state-of-the-art binding energies on the Absolut! benchmark across eight antigens in both online and offline evaluations, demonstrating robust convergence and data-efficient learning from pre-collected datasets. This work enables practical antibody design with pre-existing data and paves the way for antigen-specific design by extending the modeling of biophysical properties.
Abstract
The field of antibody-based therapeutics has grown significantly in recent years, with targeted antibodies emerging as a potentially effective approach to personalized therapies. Such therapies could be particularly beneficial for complex, highly individual diseases such as cancer. However, progress in this field is often constrained by the extensive search space of amino acid sequences that form the foundation of antibody design. In this study, we introduce a novel reinforcement learning method specifically tailored to address the unique challenges of this domain. We demonstrate that our method can learn the design of high-affinity antibodies against multiple targets in silico, utilizing either online interaction or offline datasets. To the best of our knowledge, our approach is the first of its kind and outperforms existing methods on all tested antigens in the Absolut! database.
