Table of Contents
Fetching ...

SynLeaF: A Dual-Stage Multimodal Fusion Framework for Synthetic Lethality Prediction Across Pan- and Single-Cancer Contexts

Zheming Xing, Siyuan Zhou, Ruinan Wang, Rui Han, Shiming Zhang, Shiqu Chen, Yurui Huang, Jiahao Ma, Yifan Chen, Xuan Wang, Yadong Wang, Junyi Li

Abstract

Accurate prediction of synthetic lethality (SL) is important for guiding the development of cancer drugs and therapies. SL prediction faces significant challenges in the effective fusion of heterogeneous multi-source data. Existing multimodal methods often suffer from "modality laziness" due to disparate convergence speeds, which hinders the exploitation of complementary information. This is also one reason why most existing SL prediction models cannot perform well on both pan-cancer and single-cancer SL pair prediction. In this study, we propose SynLeaF, a dual-stage multimodal fusion framework for SL prediction across pan- and single-cancer contexts. The framework employs a VAE-based cross-encoder with a product of experts mechanism to fuse four omics data types (gene expression, mutation, methylation, and CNV), while simultaneously utilizing a relational graph convolutional network to capture structured gene representations from biomedical knowledge graphs. To mitigate modality laziness, SynLeaF introduces a dual-stage training mechanism employing featurelevel knowledge distillation with adaptive uni-modal teacher and ensemble strategies. In extensive experiments across eight specific cancer types and a pancancer dataset, SynLeaF achieves superior performance in 17 out of 19 scenarios. Ablation studies and gradient analyses further validate the critical contributions of the proposed fusion and distillation mechanisms to model robustness and generalization. To facilitate community use, a web server is available at https://synleaf.bioinformatics-lilab.cn.

SynLeaF: A Dual-Stage Multimodal Fusion Framework for Synthetic Lethality Prediction Across Pan- and Single-Cancer Contexts

Abstract

Accurate prediction of synthetic lethality (SL) is important for guiding the development of cancer drugs and therapies. SL prediction faces significant challenges in the effective fusion of heterogeneous multi-source data. Existing multimodal methods often suffer from "modality laziness" due to disparate convergence speeds, which hinders the exploitation of complementary information. This is also one reason why most existing SL prediction models cannot perform well on both pan-cancer and single-cancer SL pair prediction. In this study, we propose SynLeaF, a dual-stage multimodal fusion framework for SL prediction across pan- and single-cancer contexts. The framework employs a VAE-based cross-encoder with a product of experts mechanism to fuse four omics data types (gene expression, mutation, methylation, and CNV), while simultaneously utilizing a relational graph convolutional network to capture structured gene representations from biomedical knowledge graphs. To mitigate modality laziness, SynLeaF introduces a dual-stage training mechanism employing featurelevel knowledge distillation with adaptive uni-modal teacher and ensemble strategies. In extensive experiments across eight specific cancer types and a pancancer dataset, SynLeaF achieves superior performance in 17 out of 19 scenarios. Ablation studies and gradient analyses further validate the critical contributions of the proposed fusion and distillation mechanisms to model robustness and generalization. To facilitate community use, a web server is available at https://synleaf.bioinformatics-lilab.cn.
Paper Structure (24 sections, 12 equations, 6 figures, 4 tables)

This paper contains 24 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the SynLeaF Framework. SynLeaF constitutes a dual-stage multimodal integration architecture designed for synthetic lethality prediction, taking cancer-specific omics profiles and biomedical knowledge graphs as inputs. The Omics Encoder uses a cross-encoder architecture based on a Variational Autoencoder (VAE) and performs early fusion on four types of omics data: copy number variation (cnv), gene expression (exp), DNA methylation (myl), and mutation (mut), through a Product of Experts (PoE) mechanism. The Knowledge Graph Encoder utilizes a Relational Graph Convolutional Network (RGCN) to extract structural features associated with genes within the biological network. With these encoders, SynLeaF first independently pre-trains two unimodal models and then constructs two base estimators respectively under two complementary fusion strategies in further training. For both the UMT and UME strategies, the parameters of the pre-trained unimodal encoders are strictly frozen. Specifically, under the UMT strategy, SynLeaF treats the pre-trained unimodal encoders as teachers, guiding the training of a multimodal student model through feature-level knowledge distillation. Under the UME strategy, SynLeaF directly integrates the prediction results ($p_o$ and $p_k$) from the two pre-trained unimodal models. Here, $p_o$ and $p_k$ denote the predicted probabilities from the Omics and Knowledge Graph models, respectively. Finally, SynLeaF adaptively selects the optimal strategy between UMT and UME according to observed validation efficacy.
  • Figure 2: Radar chart comparing the performance of SynLeaF and unimodal baseline variants. This chart shows the AUC performance comparison of SynLeaF against the two unimodal baseline variants, Only Omics and Only KG, on the pan-cancer and eight cancer-specific datasets, under the two data splitting strategies of CV1 and CV2. The performance curve of SynLeaF forms an "envelope effect" over the unimodal models on almost all datasets, demonstrating the consistent advantage of multimodal fusion.
  • Figure 3: Performance comparison of SynLeaF baseline variants on two cancer datasets. This figure shows the AUC scores for the unimodal baseline variants (Only Omics, Only KG) and the baseline variants respectively employing two multimodal fusion strategies (UMT, UME) on the CESC and COAD cancer datasets under the CV1 split. The height of the bars corresponds to the mean AUC obtained via 5-fold cross-validation, while the standard deviations are denoted by the error bars. The star indicates the optimal fusion strategy, which is finally adopted in the SynLeaF adaptive selection mechanism on that dataset.
  • Figure 4: Sensitivity analysis of the $\lambda_{\text{distill}}$ parameter in the UMT module. This figure shows how the performance (AUC, AUPR, F1-Score) of the UMT fusion strategy changes with the knowledge distillation weight $\lambda_{\text{distill}}$ on all single-cancer and pan-cancer datasets. A value of $\lambda_{\text{distill}}=0$ corresponds to Naïve multimodal training without distillation regularization.
  • Figure 5: Gradient dynamics analysis on the CESC (CV1, Fold2) dataset.(a) Overall test performance comparison: Test AUC curves for UMT (solid line) and the Naïve no-distillation baseline (dashed line). The red dots mark the checkpoints selected based on the validation set. (b) improvements in unimodal scenarios: change in test AUC for the Omics and KG modalities under the UMT and Naïve no-distillation strategies (denoted as Omics/KG ($\lambda_{\text{distill}}=50$) and Omics/KG ($\lambda_{\text{distill}}=0$), respectively). The vertical line marks the UMT checkpoint position. (c) Gradient norm comparison: change in the gradient norms of the two modality encoders over training epochs under different distillation weights $\lambda_{\text{distill}}$. (d) Gradient norm ratio: trend of the gradient balance between the two modalities. Here, the gradient norm ratio is defined as the $L_2$ norm of the gradients with respect to the Omics encoder's parameters divided by that of the KG encoder's parameters.
  • ...and 1 more figures