Table of Contents
Fetching ...

UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

Ruifeng Li, Mingqian Li, Wei Liu, Yuhua Zhou, Xiangxin Zhou, Yuan Yao, Qiang Zhang, Hongyang Chen

TL;DR

UniMatch tackles the data-scarcity challenge in drug discovery by unifying explicit hierarchical molecular matching with implicit task-level matching learned through meta-learning. The model encodes multi-level molecular representations via a GIN backbone with mean pooling, and uses an attention-based hierarchical matcher across atomic, substructural, and molecular levels, fused across layers. A meta-learning component introduces a task relationship mechanism that enables rapid adaptation to new tasks, demonstrated by strong performance on MoleculeNet, FS-Mol, and Meta-MolNet benchmarks. Across datasets, UniMatch achieves consistent improvements in AUROC and Delta-AUPRC, while visualization studies reveal interpretable, layer-wise attention dynamics that reflect hierarchical structure in drug-like molecules.

Abstract

Drug discovery is crucial for identifying candidate drugs for various diseases.However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching via meta-learning, bridging multi-level molecular representations and task-level generalization. Specifically, our approach explicitly captures structural features across multiple levels, such as atoms, substructures, and molecules, via hierarchical pooling and matching, facilitating precise molecular representation and comparison. Additionally, we employ a meta-learning strategy for implicit task-level matching, allowing the model to capture shared patterns across tasks and quickly adapt to new ones. This unified matching framework ensures effective molecular alignment while leveraging shared meta-knowledge for fast adaptation. Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks, achieving improvements of 2.87% in AUROC and 6.52% in delta AUPRC. UniMatch also shows excellent generalization ability on the Meta-MolNet benchmark.

UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

TL;DR

UniMatch tackles the data-scarcity challenge in drug discovery by unifying explicit hierarchical molecular matching with implicit task-level matching learned through meta-learning. The model encodes multi-level molecular representations via a GIN backbone with mean pooling, and uses an attention-based hierarchical matcher across atomic, substructural, and molecular levels, fused across layers. A meta-learning component introduces a task relationship mechanism that enables rapid adaptation to new tasks, demonstrated by strong performance on MoleculeNet, FS-Mol, and Meta-MolNet benchmarks. Across datasets, UniMatch achieves consistent improvements in AUROC and Delta-AUPRC, while visualization studies reveal interpretable, layer-wise attention dynamics that reflect hierarchical structure in drug-like molecules.

Abstract

Drug discovery is crucial for identifying candidate drugs for various diseases.However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching via meta-learning, bridging multi-level molecular representations and task-level generalization. Specifically, our approach explicitly captures structural features across multiple levels, such as atoms, substructures, and molecules, via hierarchical pooling and matching, facilitating precise molecular representation and comparison. Additionally, we employ a meta-learning strategy for implicit task-level matching, allowing the model to capture shared patterns across tasks and quickly adapt to new ones. This unified matching framework ensures effective molecular alignment while leveraging shared meta-knowledge for fast adaptation. Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks, achieving improvements of 2.87% in AUROC and 6.52% in delta AUPRC. UniMatch also shows excellent generalization ability on the Meta-MolNet benchmark.

Paper Structure

This paper contains 73 sections, 8 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: Different levels of molecular structures affect different properties: (a) at the atomic level, fluorine and nitrogen affect molecular acidity and basicity, respectively; (b) at the substructural level, hydroxyl groups affect the hydrophobicity of ethanol and dodecane; and (c) at the molecular level, the overall structures influence boiling points. Key molecular structures are highlighted in red.
  • Figure 2: The overview of UniMatch. Left: Our model follows a hierarchical pooling-matching architecture comprising two components: an encoding module (including pooling) and a matching module. First, mean pooling is applied at each GNN layer to generate multi-level molecular representations. Then, an attention mechanism is utilized to align representations between the support set and query set across different levels. Finally, predictions from different GNN layers are integrated to obtain the final results. Right: The detailed process of the matching module.
  • Figure 3: Mean performance with standard errors on the FS-Mol test tasks. (a) Performance of all compared approaches on the FS-Mol benchmark. (b) Ablation study of the dual matching mechanism in UniMatch across different backbones.
  • Figure 4: The performance of all compared methods on the seven classification tasks with a support set of size 2 on the Meta-MolNet benchmark. Each colored sector represents a method, with the height of the sector indicating its effectiveness. Starting from the black arrow, the methods are listed in the legend in a counterclockwise direction. UniMatch corresponds to the orange sector. The dashed orange circle marks the results of UniMatch. Methods with sectors below this line do not surpass UniMatch, while those above it show superior performance.
  • Figure 5: Layer-wise visualization for NR-AhR toxicity prediction. The first row presents PCA projections of 10 molecules, distinguishing active (blue) from inactive (pink) molecules. The second row displays an internal visualization of a selected molecule across different layers, where color intensity indicating shifts in the model's attention as the layers deepen.
  • ...and 2 more figures