Table of Contents
Fetching ...

Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry

Amirtha Varshini A S, Duminda S. Ranasinghe, Hok Hei Tam

TL;DR

The paper addresses the opacity of decision policies in Generative Flow Networks (GFlowNets) applied to drug design by introducing an interpretability toolkit for SynFlowNet trained with the reward $QED$. It combines gradient-based saliency, counterfactual perturbations, sparse autoencoders, and motif probes to connect atomic-level decisions to interpretable chemical concepts. The results show atom-level saliency aligns with chemically meaningful regions, latent factors capture polarity and size with strong $R^2$ correlations, and motif probes reveal linearly decodable functional groups, collectively enabling mechanistic insight into GFlowNet decision-making. These findings support transparent, controllable molecular design and point toward conditioning generative policies on interpretable physicochemical axes for improved alignment with medicinal chemistry reasoning.

Abstract

Generative Flow Networks, or GFlowNets, offer a promising framework for molecular design, but their internal decision policies remain opaque. This limits adoption in drug discovery, where chemists require clear and interpretable rationales for proposed structures. We present an interpretability framework for SynFlowNet, a GFlowNet trained on documented chemical reactions and purchasable starting materials that generates both molecules and the synthetic routes that produce them. Our approach integrates three complementary components. Gradient based saliency combined with counterfactual perturbations identifies which atomic environments influence reward and how structural edits change molecular outcomes. Sparse autoencoders reveal axis aligned latent factors that correspond to physicochemical properties such as polarity, lipophilicity, and molecular size. Motif probes show that functional groups including aromatic rings and halogens are explicitly encoded and linearly decodable from the internal embeddings. Together, these results expose the chemical logic inside SynFlowNet and provide actionable and mechanistic insight that supports transparent and controllable molecular design.

Interpreting GFlowNets for Drug Discovery: Extracting Actionable Insights for Medicinal Chemistry

TL;DR

The paper addresses the opacity of decision policies in Generative Flow Networks (GFlowNets) applied to drug design by introducing an interpretability toolkit for SynFlowNet trained with the reward . It combines gradient-based saliency, counterfactual perturbations, sparse autoencoders, and motif probes to connect atomic-level decisions to interpretable chemical concepts. The results show atom-level saliency aligns with chemically meaningful regions, latent factors capture polarity and size with strong correlations, and motif probes reveal linearly decodable functional groups, collectively enabling mechanistic insight into GFlowNet decision-making. These findings support transparent, controllable molecular design and point toward conditioning generative policies on interpretable physicochemical axes for improved alignment with medicinal chemistry reasoning.

Abstract

Generative Flow Networks, or GFlowNets, offer a promising framework for molecular design, but their internal decision policies remain opaque. This limits adoption in drug discovery, where chemists require clear and interpretable rationales for proposed structures. We present an interpretability framework for SynFlowNet, a GFlowNet trained on documented chemical reactions and purchasable starting materials that generates both molecules and the synthetic routes that produce them. Our approach integrates three complementary components. Gradient based saliency combined with counterfactual perturbations identifies which atomic environments influence reward and how structural edits change molecular outcomes. Sparse autoencoders reveal axis aligned latent factors that correspond to physicochemical properties such as polarity, lipophilicity, and molecular size. Motif probes show that functional groups including aromatic rings and halogens are explicitly encoded and linearly decodable from the internal embeddings. Together, these results expose the chemical logic inside SynFlowNet and provide actionable and mechanistic insight that supports transparent and controllable molecular design.

Paper Structure

This paper contains 24 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of the proposed interpretability framework for GFlowNets. Our pipeline integrates (1) sparse autoencoders shu2025surveysparseautoencodersinterpreting (SAEs) for discovering disentangled chemical factors such as polarity and lipophilicity, (2) motif probes to test whether embeddings encode functional groups, and (3) gradient-based saliency and counterfactual perturbations for atom- and motif-level attribution. Together, these approaches span fine-grained atomic rationales to high-level medicinal chemistry concepts.
  • Figure 2: Interpretability results on SynFlowNet embeddings. (A) Predictive performance of sparse autoencoder (SAE) factors across six chemical properties, showing that factors disentangle polarity, size, and lipophilicity more effectively than composite QED. (B) Example SynFlowNet trajectories with their atom-level saliency (highlighted atoms) and a counterfactual edit that alters predicted QED, illustrating intervention-based attribution. (C) Motif–factor correlation heatmap from motif probes, revealing that embeddings encode functional groups such as halogens, aromatic rings, and carbonyl groups with high fidelity.
  • Figure 3: Left: Autoencoder training loss rapidly decreases and plateaus after $\sim$50 epochs, indicating stable convergence of the reconstruction and sparsity objectives. Right: Average latent factor activations across 128 neurons show sparse, selective patterns—most factors remain near-zero while a subset exhibits strong activation, consistent with disentanglement and interpretability goals.
  • Figure 4: Left: Fraction of active molecules per factor (“activation frequency”) reveals that most latent factors are only triggered by a subset of molecules, while a few are broadly active across the dataset. Right: Histogram of factor activation frequencies confirms a right-skewed sparsity distribution—over half of the factors activate in fewer than 10% of molecules—demonstrating that the sparse autoencoder learned compact, chemically specific representations.
  • Figure 5: Factor–reward correlation heatmap.
  • ...and 2 more figures