Enhancing selectivity using Wasserstein distance based reweighing

Pratik Worah

Enhancing selectivity using Wasserstein distance based reweighing

Pratik Worah

TL;DR

This work tackles domain shift between labeled data $\\mathcal{S}$ and target data $\\mathcal{T}$ by designing a scalable greedy reweighting method that tilts the training distribution toward a mixture $(1-\\alpha)\\mathbb{P}_{\\mathcal{S}}+\\alpha\\mathbb{P}_{\\mathcal{T}}$, with the limit distribution of neural weights characterized via the $1$-Wasserstein distance $W_1$. By reducing the exact $W_1$ computation to a greedy (and randomized) minimum-weight bipartite matching, the authors obtain near-linear-time guarantees under a small metric-entropy assumption, supported by randomized-sampling analysis. Theoretical results bound the TV distance between invariant SGD measures by $O(W_1)$ under a covariate-shift-like setting, and show that the greedy approach yields favorable approximation factors that improve with lower entropy. A drug-discovery case study on MNK1/MNK2 demonstrates practical impact: reweighting increases top-MNK2 hit selectivity and yields experimentally validated selective binders, illustrating a scalable transport-based approach to multi-target predictive modeling.

Abstract

Given two labeled data-sets $\mathcal{S}$ and $\mathcal{T}$, we design a simple and efficient greedy algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on $\mathcal{S}$ approaches the limiting distribution that would have resulted by training on $\mathcal{T}$. On the theoretical side, we prove that when the metric entropy of the input datasets is bounded, our greedy algorithm outputs a close to optimal reweighing, i.e., the two invariant distributions of network weights will be provably close in total variation distance. Moreover, the algorithm is simple and scalable, and we prove bounds on the efficiency of the algorithm as well. As a motivating application, we train a neural net to recognize small molecule binders to MNK2 (a MAP Kinase, responsible for cell signaling) which are non-binders to MNK1 (a highly similar protein). In our example dataset, of the 43 distinct small molecules predicted to be most selective from the enamine catalog, 2 small molecules were experimentally verified to be selective, i.e., they reduced the enzyme activity of MNK2 below 50\% but not MNK1, at 10$μ$M -- a 5\% success rate.

Enhancing selectivity using Wasserstein distance based reweighing

TL;DR

This work tackles domain shift between labeled data

and target data

by designing a scalable greedy reweighting method that tilts the training distribution toward a mixture

, with the limit distribution of neural weights characterized via the

-Wasserstein distance

. By reducing the exact

computation to a greedy (and randomized) minimum-weight bipartite matching, the authors obtain near-linear-time guarantees under a small metric-entropy assumption, supported by randomized-sampling analysis. Theoretical results bound the TV distance between invariant SGD measures by

under a covariate-shift-like setting, and show that the greedy approach yields favorable approximation factors that improve with lower entropy. A drug-discovery case study on MNK1/MNK2 demonstrates practical impact: reweighting increases top-MNK2 hit selectivity and yields experimentally validated selective binders, illustrating a scalable transport-based approach to multi-target predictive modeling.

Abstract

Given two labeled data-sets

and

, we design a simple and efficient greedy algorithm to reweigh the loss function such that the limiting distribution of the neural network weights that result from training on

approaches the limiting distribution that would have resulted by training on

. On the theoretical side, we prove that when the metric entropy of the input datasets is bounded, our greedy algorithm outputs a close to optimal reweighing, i.e., the two invariant distributions of network weights will be provably close in total variation distance. Moreover, the algorithm is simple and scalable, and we prove bounds on the efficiency of the algorithm as well. As a motivating application, we train a neural net to recognize small molecule binders to MNK2 (a MAP Kinase, responsible for cell signaling) which are non-binders to MNK1 (a highly similar protein). In our example dataset, of the 43 distinct small molecules predicted to be most selective from the enamine catalog, 2 small molecules were experimentally verified to be selective, i.e., they reduced the enzyme activity of MNK2 below 50\% but not MNK1, at 10

M -- a 5\% success rate.

Paper Structure (29 sections, 14 theorems, 33 equations, 6 figures, 3 algorithms)

This paper contains 29 sections, 14 theorems, 33 equations, 6 figures, 3 algorithms.

Introduction
Related work
Problem statement and overview of results
Drug Discovery example
Reweighing algorithm
Theoretical results
Choice of $1$-Wasserstein metric
Metric bipartite matching
Small covering assumption
Greedy on random sample
Example application: drug discovery
Acknowledgements
Supplement
Formal setup and theorem statements
Choice of metric in Algorithm \ref{['algmain']}: Bounding $1$-Wasserstein distance suffices
...and 14 more sections

Key Result

Theorem 3.2

(see Theorem wassthm for precise statement) Suppose we train two neural networks, such that (1) the limiting stochastic differential equation (SDE) corresponding to the training SGD (as SGD step-size goes to $0$) is strongly elliptic,This ensures the invariant measure of the SDE exists, is smooth an

Figures (6)

Figure 1: Selectivity of reweighed (using Algorithm \ref{['algmain']}) and baseline (without reweighing) neural nets. Note that this increase in selectivity from 54% to 95% came without any significant change in the validation loss -- the AUC for the classification of MNK2 binders vs non-binders remained around $0.6$ in both cases.
Figure 2: Two predicted and verified selective MNK1 non-hits and MNK2 hits from the Enamine catalog. The enzyme activity was found to be above 50% for MNK1 but below 50% for MNK2 at $10\mu$M concentration for each of the two small molecules: $\sim20\%$ vs $70\%$ and $\sim39\%$ vs $59\%$. Note that these values are from single point concentration assay and can be noisy.
Figure 3: (L) Tanimoto similarities of top molecules in base model vs (R) Tanimoto similarities of top molecules in reweighed model
Figure 4: (L) Mean Tanimoto Similarity for each top molecule in base with reweighed model (note the bifurcation) vs (R) Mean Tanimoto Similarity for each top molecule in reweighed with base model.
Figure 5: Alternate edges in an alternating cycle $\gamma$ belong to greedy and optimal matching.
...and 1 more figures

Theorems & Definitions (27)

Theorem 3.2
Theorem 3.3
Theorem 3.4
Theorem 3.5
Remark 4.1
Theorem B.3
Remark B.4
Theorem B.5
Definition B.6
Theorem B.7
...and 17 more

Enhancing selectivity using Wasserstein distance based reweighing

TL;DR

Abstract

Enhancing selectivity using Wasserstein distance based reweighing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (27)