Table of Contents
Fetching ...

GraphCliff: Short-Long Range Gating for Subtle Differences but Critical Changes

Hajung Kim, Jueon Park, Junseok Choe, Sheunheun Baek, Hyeon Hwang, Jaewoo Kang

TL;DR

GraphCliff tackles the activity cliff challenge in QSAR by explicitly balancing local substructure sensitivity with global molecular context through a short- and long-range gating mechanism on graph representations. By combining a GINE-based short-range filter with a Chebyshev-long-range filter and a learnable gate, the model mitigates over-smoothing while preserving local discriminative cues, achieving strong performance on cliff and non-cliff compounds. Empirical results on MoleculeACE demonstrate consistent improvements over prior graph-based methods, and transfer learning further boosts performance in data-limited LSSNS settings. The work provides both quantitative gains and qualitative evidence that gating-based fusion enhances interpretability by highlighting functionally relevant substructures.

Abstract

Quantitative structure-activity relationship assumes a smooth relationship between molecular structure and biological activity. However, activity cliffs defined as pairs of structurally similar compounds with large potency differences break this continuity. Recent benchmarks targeting activity cliffs have revealed that classical machine learning models with extended connectivity fingerprints outperform graph neural networks. Our analysis shows that graph embeddings fail to adequately separate structurally similar molecules in the embedding space, making it difficult to distinguish between structurally similar but functionally different molecules. Despite this limitation, molecular graph structures are inherently expressive and attractive, as they preserve molecular topology. To preserve the structural representation of molecules as graphs, we propose a new model, GraphCliff, which integrates short- and long-range information through a gating mechanism. Experimental results demonstrate that GraphCliff consistently improves performance on both non-cliff and cliff compounds. Furthermore, layer-wise node embedding analyses reveal reduced over-smoothing and enhanced discriminative power relative to strong baseline graph models.

GraphCliff: Short-Long Range Gating for Subtle Differences but Critical Changes

TL;DR

GraphCliff tackles the activity cliff challenge in QSAR by explicitly balancing local substructure sensitivity with global molecular context through a short- and long-range gating mechanism on graph representations. By combining a GINE-based short-range filter with a Chebyshev-long-range filter and a learnable gate, the model mitigates over-smoothing while preserving local discriminative cues, achieving strong performance on cliff and non-cliff compounds. Empirical results on MoleculeACE demonstrate consistent improvements over prior graph-based methods, and transfer learning further boosts performance in data-limited LSSNS settings. The work provides both quantitative gains and qualitative evidence that gating-based fusion enhances interpretability by highlighting functionally relevant substructures.

Abstract

Quantitative structure-activity relationship assumes a smooth relationship between molecular structure and biological activity. However, activity cliffs defined as pairs of structurally similar compounds with large potency differences break this continuity. Recent benchmarks targeting activity cliffs have revealed that classical machine learning models with extended connectivity fingerprints outperform graph neural networks. Our analysis shows that graph embeddings fail to adequately separate structurally similar molecules in the embedding space, making it difficult to distinguish between structurally similar but functionally different molecules. Despite this limitation, molecular graph structures are inherently expressive and attractive, as they preserve molecular topology. To preserve the structural representation of molecules as graphs, we propose a new model, GraphCliff, which integrates short- and long-range information through a gating mechanism. Experimental results demonstrate that GraphCliff consistently improves performance on both non-cliff and cliff compounds. Furthermore, layer-wise node embedding analyses reveal reduced over-smoothing and enhanced discriminative power relative to strong baseline graph models.

Paper Structure

This paper contains 26 sections, 8 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Overall architecture of GraphCliff.
  • Figure 2: Comprehensive analysis of propagation dynamics and stability across models. (a) Hop-wise sensitivity, where higher values indicate stronger long-range information flow. (b) Dirichlet Energy measuring node differentiation, where higher values reflect better resistance to over-smoothing. (c) Layer-wise Jacobian singular values assessing gradient flow stability, where moderate values indicate robust propagation.
  • Figure 3: Visualization of a cliff pair with large functional divergence. Top: atoms responsible for the activity cliff (highlighted in red). Bottom: attention weights from the sigmoid gating vector $\sigma(x_1)$, with warmer colors indicating higher importance.
  • Figure 4: Comparison of graph embedding Euclidean distances with ECFP fingerprint dissimilarities across different models.