A Bayesian Flow Network Framework for Chemistry Tasks
Nianze Tao, Minori Abe
TL;DR
This work introduces ChemBFN, a Bayesian flow network framework for chemistry tasks that operates on discrete data representations such as SMILES/SELFIES. By adopting a novel discrete accuracy schedule with $β(t)$ and $α(t)$, the method decouples sampling size from object length and achieves competitive generation quality with fewer steps, while enabling classifier-free guidance for conditional generation. The approach also demonstrates strong downstream predictive capability, with generative pretraining improving performance on MoleculeNet regression/classification tasks and reaction yield prediction, and shows that larger pretraining datasets do not always yield better performance. The authors release code and models publicly, highlighting the practical potential for all-in-one models in drug design, property prediction, and synthesis planning, though gaps remain compared to graph-based predictors.
Abstract
In this work, we introduce ChemBFN, a language model that handles chemistry tasks based on Bayesian flow networks working on discrete data. A new accuracy schedule is proposed to improve the sampling quality by significantly reducing the reconstruction loss. We show evidence that our method is appropriate for generating molecules with satisfied diversity even when a smaller number of sampling steps is used. A classifier-free guidance method is adapted for conditional generation. It is also worthwhile to point out that after generative training, our model can be fine-tuned on regression and classification tasks with the state-of-the-art performance, which opens the gate of building all-in-one models in a single module style. Our model has been open sourced at https://github.com/Augus1999/bayesian-flow-network-for-chemistry.
