Towards DNA-Encoded Library Generation with GFlowNets
Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio
TL;DR
This work addresses designing DNA-Encoded Libraries biased toward protein-protein interaction (PPI) modulators by framing DEL construction as a binary vector search and using GFlowNets to sample diverse, high-pPI-likelihood library candidates. The authors propose flat (DEL-GFlowNet) and hierarchical (H-DEL-GFlowNet) action spaces, with a reward R(x) = \exp\left(\frac{\beta}{|\mathcal{L}(x)|}\sum_{i} p(\mathcal{L}(x)_i)\right) based on a PPI predictor p, and demonstrate that GFlowNets yield diverse libraries with competitive estimated PPI likelihood versus baselines. They evaluate proxy models (random forest, MolFormer, and especially pretrained GNNs) and find the pretrained GNN provides the strongest signal for reward shaping, while hierarchical action design reduces policy complexity. Overall, the approach shows promise for generating multiple, diverse DEL candidates biased toward PPI modulators, though scaling to very large libraries and proxy reliability remain open challenges for practical deployment. The work provides a framework for reusable DEL design across targets and highlights a path toward integrating structure-aware hierarchies into generative screening strategies, with implications for faster, more diverse experimental screening.
Abstract
DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
