Learning the PTM Code through a Coarse-to-Fine, Mechanism-Aware Framework
Jingjie Zhang, Hanqun Cao, Zijun Gao, Yu Wang, Shaoning Li, Jun Xu, Cheng Tan, Jun Zhu, Chang-Yu Hsieh, Chunbin Gu, Pheng Ann Heng
TL;DR
COMPASS-PTM introduces a two-stage, mechanism-aware framework that unifies proteome-scale PTM site profiling with enzyme-substrate pairing. It leverages a dual-modal encoder (PLM+CLM) and a crosstalk-aware prompting mechanism to model PTM dependencies, addressing the dual long-tail data challenge. The Stage 2 ESPS module couples refined substrate representations with enzyme embeddings via a dual-gated fusion to predict cognate enzymes, enabling zero-shot generalization to unseen kinases. Across multiple benchmarks, the approach achieves state-of-the-art performance, recovers canonical kinase motifs, and provides mechanistic, disease-relevant predictions, bridging statistical learning with biochemical regulation. This integration of interpretable, mechanism-informed predictions offers a powerful platform for decoding the PTM code and guiding experimental validation and translational research.
Abstract
Post-translational modifications (PTMs) form a combinatorial "code" that regulates protein function, yet deciphering this code - linking modified sites to their catalytic enzymes - remains a central unsolved problem in understanding cellular signaling and disease. We introduce COMPASS-PTM, a mechanism-aware, coarse-to-fine learning framework that unifies residue-level PTM profiling with enzyme-substrate assignment. COMPASS-PTM integrates evolutionary representations from protein language models with physicochemical priors and a crosstalk-aware prompting mechanism that explicitly models inter-PTM dependencies. This design allows the model to learn biologically coherent patterns of cooperative and antagonistic modifications while addressing the dual long-tail distribution of PTM data. Across multiple proteome-scale benchmarks, COMPASS-PTM establishes new state-of-the-art performance, including a 122% relative F1 improvement in multi-label site prediction and a 54% gain in zero-shot enzyme assignment. Beyond accuracy, the model demonstrates interpretable generalization, recovering canonical kinase motifs and predicting disease-associated PTM rewiring caused by missense variants. By bridging statistical learning with biochemical mechanism, COMPASS-PTM unifies site-level and enzyme-level prediction into a single framework that learns the grammar underlying protein regulation and signaling.
