A Strong Baseline for Molecular Few-Shot Learning

Philippe Formont; Hugo Jeannin; Pablo Piantanida; Ismail Ben Ayed

A Strong Baseline for Molecular Few-Shot Learning

Philippe Formont, Hugo Jeannin, Pablo Piantanida, Ismail Ben Ayed

TL;DR

The paper addresses molecular few-shot learning under data scarcity by revisiting simple fine-tuning instead of meta-learning. It introduces a quadratic-probing classifier based on Mahalanobis distance with class prototypes $w_k$ and precision matrices $M_k$, optimized via block-coordinate descent with a shrinkage-regularized surrogate to prevent degenerate covariance growth, and it uses a multitask GNN backbone pretrained on FS-mol. On FS-mol and out-of-domain shifts, the quadratic probe (and the linear probe) yield competitive or superior performance compared to state-of-the-art meta-learning approaches, demonstrating robustness to domain shifts and applicability in black-box settings. The work also provides extensive ablations and domain-shift benchmarks, including imbalanced QSAR targets and large-scale HTS library screening, illustrating practical advantages of simple fine-tuning baselines. Overall, the proposed methods offer efficient, robust few-shot classifiers for drug discovery tasks, with the quadratic probe delivering the best average gains and strong resilience to distribution shifts, and the authors release their code for reproducibility.

Abstract

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving convoluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.

A Strong Baseline for Molecular Few-Shot Learning

TL;DR

and precision matrices

, optimized via block-coordinate descent with a shrinkage-regularized surrogate to prevent degenerate covariance growth, and it uses a multitask GNN backbone pretrained on FS-mol. On FS-mol and out-of-domain shifts, the quadratic probe (and the linear probe) yield competitive or superior performance compared to state-of-the-art meta-learning approaches, demonstrating robustness to domain shifts and applicability in black-box settings. The work also provides extensive ablations and domain-shift benchmarks, including imbalanced QSAR targets and large-scale HTS library screening, illustrating practical advantages of simple fine-tuning baselines. Overall, the proposed methods offer efficient, robust few-shot classifiers for drug discovery tasks, with the quadratic probe delivering the best average gains and strong resilience to distribution shifts, and the authors release their code for reproducibility.

Abstract

Paper Structure (23 sections, 2 theorems, 28 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 28 equations, 11 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Methods
Model pre-training
Multitask Linear Probing
Quadratic probing
Experimental Results
FS-mol benchmark
Ablation on the Free optimisation of $\Sigma_k$
Impact of Domain Shifts
QSAR modelling with imbalanced class distribution
Library screening
Summary and Concluding Remarks
Statement of Broader Impact
Appendix
...and 8 more sections

Key Result

Proposition 3.1

Let $\Theta = \{{\bm{w}}_k, {\mathbf{M}}_k\}_{k\in \{0,1\}}$, and we will note $p_\Theta(k|{\bm{z}}) = p_{i,k}$ as described in eq:quad_probing (to highlight the dependency to the parameters $\Theta$). If the samples from both classes are linearly separable, we can construct a set of parameters, $\T where $\mathcal{L}_{ce}({\bm{z}}_i, y_i, \Theta) = -\log p_{\Theta}(y_i|{\bm{z}}_i)$ is the point-w

Figures (11)

Figure 1: Values of the logits produced by two different classifiers: a linear probe (left) and a quadratic probe based on the Mahalanobis distance (right), in a two-dimensional feature space with two classes.
Figure 2: Evolution of the model's performance on the validation set during the few-shot adaptation. Free-opt refers to performing a gradient descent on ${\mathbf{M}}_k$, Free-opt-reg adds a regularisation on the norm of ${\mathbf{M}}_k$. ($|\mathcal{S}|=64$)
Figure 3: Performances of various methods on the DTI tasks. The fine-tuning baselines outperform the meta-learning methods when the tasks become more imbalanced.
Figure 4: Average ranking performances of each method on the HTS tasks according to the percentage of the dataset selected. The quadratic probe obtains the best results when the support set's size is small, while the similarity search is the best model with larger support sets.
Figure 5: Evolution of the maximal eigenvalue of ${\mathbf{M}}_k$ along training when optimized with gradient descent (Free-Opt) or with the quadratic probe (q-probe).
...and 6 more figures

Theorems & Definitions (8)

Proposition 3.1
Remark 3.2
Proposition 3.3
Remark 3.4
proof
proof
proof
proof

A Strong Baseline for Molecular Few-Shot Learning

TL;DR

Abstract

A Strong Baseline for Molecular Few-Shot Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (8)