Community Detection in Multimodal Data: A Similarity Network Perspective
Aidan Marnane, T. Ian Simpson
TL;DR
The paper addresses how to construct reliable multi-modal similarity networks for community detection in biomedical data by introducing a synthetic data framework that systematically varies inter-modality consistency and data distributions. It evaluates five integration methods—Concatenated Features, Mean Similarity, Extreme Mean, SNF, and NEMO—across controlled scenarios and partial data, using network metrics and clustering performance to reveal method strengths and weaknesses. A key finding is that SNF and NEMO do not universally outperform simpler approaches like Mean $S_i$ or Concatenated $X_i$, especially in merged or merged-like cluster settings, while NEMO demonstrates superior robustness to partial modalities. The work provides practical guidance for method selection in multi-modal clustering and lays groundwork for extending similarity integration to more realistic, heterogeneous biomedical datasets with incomplete data.
Abstract
Similarity network construction is a fundamental step in many approaches to community detection in biomedical analysis. It is utilised both in the creation of network structures from non-relational data and as a processing step in clustering pipelines. The foundation of any network analysis approach hinges on the quality of the underlying network. With the rising popularity of network learning and use of network-based clustering, the importance of correctly constructing the network is vital. The underlying mechanisms of similarity network construction, particularly the implications of the choice of approach for multi-modal integration, remain poorly explored. By introducing differences in embedded cluster information and noise levels across modalities, we assess the performance of popular similarity integration techniques such as Similarity Network Fusion (SNF) and NEighborhood based Multi-Omics clustering (NEMO). Notably, SNF and NEMO fail to outperform simpler techniques such as mean similarity aggregation when incorporating modalities with inconsistently embedded clusters. We demonstrate how integration methods can be used to incorporate partial modalities - datasets where not all individuals have a full set of measurements in all modalities. SNF shows significant sensitivity to incomplete modalities while NEMO and mean aggregation are more resilient.
