Synergistic Benefits of Joint Molecule Generation and Property Prediction
Adam Izdebski, Jan Olszewski, Pankhil Gawade, Krzysztof Koras, Serra Korkmaz, Valentin Rauscher, Jakub M. Tomczak, Ewa Szczurek
TL;DR
Hyformer presents a unified transformer architecture that jointly models data generation and property prediction for molecules by combining an autoregressive decoder with a bidirectional encoder in a single shared backbone. Through alternating attention and a joint pre-training scheme, it achieves synergistic benefits in conditional sampling, OOD property prediction, and representation learning, while maintaining competitive generative and predictive performance. The method is validated on conditional molecule generation, out-of-distribution prediction, molecular representation learning, unconditional generation, and antimicrobial peptide design, demonstrating real-world applicability in drug discovery. These results highlight the potential of flexible, jointly trained transformers to streamline end-to-end molecular design pipelines where generation quality and predictive robustness are both crucial.
Abstract
Modeling the joint distribution of data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic benefits reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mechanism and a joint pre-training scheme. We show that Hyformer is simultaneously optimized for molecule generation and property prediction, while exhibiting synergistic benefits in conditional sampling, out-of-distribution property prediction and representation learning. Finally, we demonstrate the benefits of joint learning in a drug design use case of discovering novel antimicrobial~peptides.
