Table of Contents
Fetching ...

Towards DNA-Encoded Library Generation with GFlowNets

Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio

TL;DR

This work addresses designing DNA-Encoded Libraries biased toward protein-protein interaction (PPI) modulators by framing DEL construction as a binary vector search and using GFlowNets to sample diverse, high-pPI-likelihood library candidates. The authors propose flat (DEL-GFlowNet) and hierarchical (H-DEL-GFlowNet) action spaces, with a reward R(x) = \exp\left(\frac{\beta}{|\mathcal{L}(x)|}\sum_{i} p(\mathcal{L}(x)_i)\right) based on a PPI predictor p, and demonstrate that GFlowNets yield diverse libraries with competitive estimated PPI likelihood versus baselines. They evaluate proxy models (random forest, MolFormer, and especially pretrained GNNs) and find the pretrained GNN provides the strongest signal for reward shaping, while hierarchical action design reduces policy complexity. Overall, the approach shows promise for generating multiple, diverse DEL candidates biased toward PPI modulators, though scaling to very large libraries and proxy reliability remain open challenges for practical deployment. The work provides a framework for reusable DEL design across targets and highlights a path toward integrating structure-aware hierarchies into generative screening strategies, with implications for faster, more diverse experimental screening.

Abstract

DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.

Towards DNA-Encoded Library Generation with GFlowNets

TL;DR

This work addresses designing DNA-Encoded Libraries biased toward protein-protein interaction (PPI) modulators by framing DEL construction as a binary vector search and using GFlowNets to sample diverse, high-pPI-likelihood library candidates. The authors propose flat (DEL-GFlowNet) and hierarchical (H-DEL-GFlowNet) action spaces, with a reward R(x) = \exp\left(\frac{\beta}{|\mathcal{L}(x)|}\sum_{i} p(\mathcal{L}(x)_i)\right) based on a PPI predictor p, and demonstrate that GFlowNets yield diverse libraries with competitive estimated PPI likelihood versus baselines. They evaluate proxy models (random forest, MolFormer, and especially pretrained GNNs) and find the pretrained GNN provides the strongest signal for reward shaping, while hierarchical action design reduces policy complexity. Overall, the approach shows promise for generating multiple, diverse DEL candidates biased toward PPI modulators, though scaling to very large libraries and proxy reliability remain open challenges for practical deployment. The work provides a framework for reusable DEL design across targets and highlights a path toward integrating structure-aware hierarchies into generative screening strategies, with implications for faster, more diverse experimental screening.

Abstract

DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
Paper Structure (15 sections, 2 equations, 6 figures, 3 tables)

This paper contains 15 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of DEL generation. Building blocks are first attached to a DNA tag used for identification, and later, across several consecutive cycles, combined in a combinatorial manner to produce a library of molecules built by joining together a sequence of building blocks.
  • Figure 2: Illustration of the used state representation. a) Information about which building blocks are picked for a given cycle is represented as a binary vector, where each entry corresponds to a specific building block. b) For each cycle, a collection of selected building blocks is decoded. c) Resulting library is a Cartesian product of all the possible building blocks from different cycles, where each triplet of building blocks is combined to produce a final molecule.
  • Figure 3: Illustration of action spaces of flat and hierarchical DEL-GFlowNet. Lines indicate actions (black: chosen, gray: other possible), and dots indicate states (blue: starting, yellow: intermediate, green: final). End effect in both cases is selection of one additional building block. Hierarchical variant allows us to significantly reduce the number of valid actions at any given stage.
  • Figure 4: Comparison of model similarity. Individual points represent specific building blocks from first cycle, with average probability values of molecules containing that building block.
  • Figure 5: Distributions of average chemical library properties of top-100 generated libraries.
  • ...and 1 more figures