Table of Contents
Fetching ...

The Signed Two-Space Proximity Model for Learning Representations in Protein-Protein Interaction Networks

Nikolaos Nakis, Chrysoula Kosma, Anastasia Brativnyk, Michail Chatzianastasis, Iakovos Evdaimon, Michalis Vazirgiannis

TL;DR

The paper tackles learning representations for signed protein-protein interaction (PPI) networks that include activating (positive) and inhibiting (negative) interactions. It introduces the Signed Two-Space Proximity Model (S2-SPM), which uses two independent latent spaces and archetypal analysis to model positive and negative interactions separately, employing a Skellam likelihood to reconstruct the signed graph. The work demonstrates superior signed link prediction, robust BNMI-based recovery of archetype structures, and biologically meaningful GO-term enrichment for archetypes, validating both interpretability and functional relevance. By decoupling the positive and negative spaces, S2-SPM offers a principled and interpretable framework for decoding complex regulatory patterns in SPPI networks, with publicly available SIGNOR data and GO annotations enabling further exploration.

Abstract

Accurately predicting complex protein-protein interactions (PPIs) is crucial for decoding biological processes, from cellular functioning to disease mechanisms. However, experimental methods for determining PPIs are computationally expensive. Thus, attention has been recently drawn to machine learning approaches. Furthermore, insufficient effort has been made toward analyzing signed PPI networks, which capture both activating (positive) and inhibitory (negative) interactions. To accurately represent biological relationships, we present the Signed Two-Space Proximity Model (S2-SPM) for signed PPI networks, which explicitly incorporates both types of interactions, reflecting the complex regulatory mechanisms within biological systems. This is achieved by leveraging two independent latent spaces to differentiate between positive and negative interactions while representing protein similarity through proximity in these spaces. Our approach also enables the identification of archetypes representing extreme protein profiles. S2-SPM's superior performance in predicting the presence and sign of interactions in SPPI networks is demonstrated in link prediction tasks against relevant baseline methods. Additionally, the biological prevalence of the identified archetypes is confirmed by an enrichment analysis of Gene Ontology (GO) terms, which reveals that distinct biological tasks are associated with archetypal groups formed by both interactions. This study is also validated regarding statistical significance and sensitivity analysis, providing insights into the functional roles of different interaction types. Finally, the robustness and consistency of the extracted archetype structures are confirmed using the Bayesian Normalized Mutual Information (BNMI) metric, proving the model's reliability in capturing meaningful SPPI patterns.

The Signed Two-Space Proximity Model for Learning Representations in Protein-Protein Interaction Networks

TL;DR

The paper tackles learning representations for signed protein-protein interaction (PPI) networks that include activating (positive) and inhibiting (negative) interactions. It introduces the Signed Two-Space Proximity Model (S2-SPM), which uses two independent latent spaces and archetypal analysis to model positive and negative interactions separately, employing a Skellam likelihood to reconstruct the signed graph. The work demonstrates superior signed link prediction, robust BNMI-based recovery of archetype structures, and biologically meaningful GO-term enrichment for archetypes, validating both interpretability and functional relevance. By decoupling the positive and negative spaces, S2-SPM offers a principled and interpretable framework for decoding complex regulatory patterns in SPPI networks, with publicly available SIGNOR data and GO annotations enabling further exploration.

Abstract

Accurately predicting complex protein-protein interactions (PPIs) is crucial for decoding biological processes, from cellular functioning to disease mechanisms. However, experimental methods for determining PPIs are computationally expensive. Thus, attention has been recently drawn to machine learning approaches. Furthermore, insufficient effort has been made toward analyzing signed PPI networks, which capture both activating (positive) and inhibitory (negative) interactions. To accurately represent biological relationships, we present the Signed Two-Space Proximity Model (S2-SPM) for signed PPI networks, which explicitly incorporates both types of interactions, reflecting the complex regulatory mechanisms within biological systems. This is achieved by leveraging two independent latent spaces to differentiate between positive and negative interactions while representing protein similarity through proximity in these spaces. Our approach also enables the identification of archetypes representing extreme protein profiles. S2-SPM's superior performance in predicting the presence and sign of interactions in SPPI networks is demonstrated in link prediction tasks against relevant baseline methods. Additionally, the biological prevalence of the identified archetypes is confirmed by an enrichment analysis of Gene Ontology (GO) terms, which reveals that distinct biological tasks are associated with archetypal groups formed by both interactions. This study is also validated regarding statistical significance and sensitivity analysis, providing insights into the functional roles of different interaction types. Finally, the robustness and consistency of the extracted archetype structures are confirmed using the Bayesian Normalized Mutual Information (BNMI) metric, proving the model's reliability in capturing meaningful SPPI patterns.

Paper Structure

This paper contains 5 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the proposed Signed Two-Space Proximity Model (S$2$-SPM). Given a signed protein-protein interaction network as an input, the model assigns two latent vectors $\bm{z}_i,\bm{w}_i$ for each of the positive and negative interactions that project each protein to the two archetypal matrices/polytopes $\bm{A}_{(+)}$ and $\bm{A}_{(-)}$, respectively. Then, embeddings are used to calculate the Skellam rates, optimized for the Skellam log-likelihood, to reconstruct the original signed protein-protein graph.
  • Figure 2: Homo sapiensS$2$-SPM(K=8): Positive space (a) and (b), and negative space (c) and (d) inferred simplex visualizations and ordered adjacency matrices for $K=8$ archetypes. Figures \ref{['fig:polytopes_pos']} (a) and (c), provide the Positive/Negative Space Circular Plot (PSCP)/(NSCP) with blue/red lines showcasing positive/negative edges between proteins---Figures\ref{['fig:polytopes_pos']} (b) and (d), show the Ordered Positive/Negative Edges Adjacency (OrA) matrices sorted based on the memberships $\mathbf{z}_i$/$\mathbf{w}_i$, in terms of maximum simplex corner responsibility.
  • Figure 3: Bayesian Normalized Mutual Information (BNMI): Robustness of solution and structure characterization of S$2$-SPM , as a function of the number of dimensions/archetypes across five reruns, and three networks. For a given dataset, the (RANDOM) labeled lines denote the BNMI value of solution that should be expected by luck, for a given choice for the number of archetypes/dimensions.

Theorems & Definitions (1)

  • Definition 1