Generalization within in silico screening

Andreas Loukas; Pan Kessel; Vladimir Gligorijevic; Richard Bonneau

Generalization within in silico screening

Andreas Loukas, Pan Kessel, Vladimir Gligorijevic, Richard Bonneau

TL;DR

This work reframes in silico screening as a policy-driven generalization problem, showing that the selectivity of batch-design policies and the rarity of predicted positives critically shape generalization. It extends learning theory with a PAC-Bayes, stability, and Lipschitz framework to bound the screening risk under a policy $\pi_f(x) \propto \alpha + f(x)$ and introduces batched prediction, where the mean batch label is predicted and evaluated. The paper proves that batching generally improves generalization, with bounds that benefit from larger batch sizes and are mitigated by early stopping, and validates these ideas empirically on antibody design and QM9 molecular property tasks. The results yield actionable guidance: use less aggressive per-sample selectivity (e.g., set $\alpha=1$) and rely on larger batch sizes to reliably forecast batch-quality, while remaining mindful of distribution shifts and the asymptotic nature of the bounds.

Abstract

In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. Unlike conventional learning paradigms, success in this context is measured by the performance of the predictive model on the selected subset of compounds rather than the entire set of predictions. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization, with a higher risk of errors occurring when exclusively selecting predicted positives and when targeting rare properties. Our analysis suggests a way to mitigate these challenges. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch. This is promising, as the primary aim of screening is not necessarily to pinpoint the label of each compound individually, but rather to assemble a batch enriched for desirable compounds. Our theoretical insights are empirically validated across diverse tasks, architectures, and screening scenarios, underscoring their applicability.

Generalization within in silico screening

TL;DR

and introduces batched prediction, where the mean batch label is predicted and evaluated. The paper proves that batching generally improves generalization, with bounds that benefit from larger batch sizes and are mitigated by early stopping, and validates these ideas empirically on antibody design and QM9 molecular property tasks. The results yield actionable guidance: use less aggressive per-sample selectivity (e.g., set

) and rely on larger batch sizes to reliably forecast batch-quality, while remaining mindful of distribution shifts and the asymptotic nature of the bounds.

Abstract

Paper Structure (39 sections, 7 theorems, 140 equations, 7 figures)

This paper contains 39 sections, 7 theorems, 140 equations, 7 figures.

Introduction
Generalization theory for in silico screening
Problem definition
Key ingredients of generalization
Selection affects generalization
Does generalization improve when evaluating prediction in batches?
The batched prediction paradigm
Dissecting the effect of batching
Understanding the subtle effects of batching to generalization
Empirical validation
Protein design: classification of protein activity
Quantum chemistry: regression for molecular property prediction
Related work
Limitations
Conclusion
...and 24 more sections

Key Result

Theorem 1

Let $l: [-1, 1] \times [-1, 1] \to [0,1]$ be a Lipschitz continuous loss function with Lipschitz constant $\lambda$. Further, consider a $\beta$-stable learner that uses the training data $Z$ to select a distribution $q_Z$ over hypotheses $f \in \mathcal{F}$ with Lipschitz constant at most $\mu$. Th with high probability, where $W_1(p, p_Z)$ measures the 1-Wasserstein distance between the empiric

Figures (7)

Figure 1: Illustration of the different learning paradigms for the case of binary classification with loss $l(f(x),y) = (f(x) - y)^2/4$. The left-most figure depicts the standard learning setting where the prediction $f(x)$ for each example $x$ is compared to the ground truth label $y$. The middle figure exemplifies in silico screening with a very selective policy, considering only predicted positives (though we also consider less strict policies in the manuscript). The right-most figure shows batched prediction with a uniform selection policy and a batch size of $k=7$, comparing the total number of positives and negatives in the prediction batch.
Figure 2: Estimation of the (rescaled) screening generalization error on the Mason et al. dataset mason2021optimization for different losses as a function of the policy selectivity hyperparameter $\alpha$ (left) and the size of the prediction batch $k$ (middle). Since both $\alpha$ and $k$ influence the policy selectivity, we also compare the generalization error achieved by changing these hyperparameters for the same selectivity. As expected, though both decreasing $\alpha$ and increasing $k$ improves generalization, using a larger batch size yields far superior results.
Figure 3: Estimated (normalized) generalization error for QM9 dataset wu2017moleculenet as a function of selectivity. Batched prediction, i.e., varying batch size $k$ at fixed $\alpha$, leads to a lower generalization error than fixing $k=1$ and varying $\alpha$. We train 6 models for each property to estimate the error.
Figure 4: Training and test losses during training for 3 different batch sizes $k$ in the antibody binding classification task. From left-to-right, the three figures focus on the KL-divergence, MAE and MSE losses, respectively. Both empirical and test risks decrease faster as $k$ increases. The generalization gap reported in the main text is the difference between the train and test loss.
Figure 5: Effect of selectivity parameter $\alpha$ as a function of the training set size $n$. High degree of selectivity remains an issue also for smaller $n$.
...and 2 more figures

Theorems & Definitions (14)

Definition 1: Stability
Theorem 1
Theorem 2
proof
Definition 2: stability
Theorem 3
Lemma 1
proof
Lemma 2
proof
...and 4 more

Generalization within in silico screening

TL;DR

Abstract

Generalization within in silico screening

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)