A decoupled alignment kernel for peptide membrane permeability predictions
Ali Amirahmadi, Gökçe Geylan, Leonardo De Maria, Farzaneh Etminani, Mattias Ohlsson, Alessandro Tibo
TL;DR
This work tackles the challenge of predicting cyclic peptide permeability with calibrated uncertainty in limited data settings. It introduces monomer-aware decoupled global alignment kernels (MD-GAK) and a position-aware variant (PMD-GAK) that pair chemically meaningful monomer fingerprints with sequence alignment within Gaussian Processes, ensuring positive definiteness and robust uncertainty estimates. Evaluations on CycPeptMPDB across leakage-aware and scaffold-based splits show that MD-GAK/PMD-GAK improve discrimination and calibration relative to strong baselines, while TAN_sim and convex mixtures reveal complementary strengths between alignment and substructure signals. By bridging mature small-molecule kernel methods with peptide topology, the approach enables data-efficient, uncertainty-aware screening and highlights a path toward richer monomer encoders from chemical language models as data scale grows.
Abstract
Cyclic peptides are promising modalities for targeting intracellular sites; however, cell-membrane permeability remains a key bottleneck, exacerbated by limited public data and the need for well-calibrated uncertainty. Instead of relying on data-eager complex deep learning architecture, we propose a monomer-aware decoupled global alignment kernel (MD-GAK), which couples chemically meaningful residue-residue similarity with sequence alignment while decoupling local matches from gap penalties. MD-GAK is a relatively simple kernel. To further demonstrate the robustness of our framework, we also introduce a variant, PMD-GAK, which incorporates a triangular positional prior. As we will show in the experimental section, PMD-GAK can offer additional advantages over MD-GAK, particularly in reducing calibration errors. Since our focus is on uncertainty estimation, we use Gaussian Processes as the predictive model, as both MD-GAK and PMD-GAK can be directly applied within this framework. We demonstrate the effectiveness of our methods through an extensive set of experiments, comparing our fully reproducible approach against state-of-the-art models, and show that it outperforms them across all metrics.
