Table of Contents
Fetching ...

Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques

Abdelmalik Moujahid, Fadi Dornaika

TL;DR

This survey systematically catalogs multi-view clustering methods, organizing them into taxonomy-driven categories (co-training, co-regularization, kernel-based, subspace, deep learning, graph-based, and anchor-based) and addressing challenges such as heterogeneity and incomplete data. It covers classical and graph-based MVC, methods that handle missing views, and a formal review of representative approaches with unified objective frameworks, including MVCSK, CI-GMVC, CNESE, HMvC, MSGL, FPMVS-CAG, OMSC, AMVSCGL, SMSC, and MCGLSR. The paper also provides dataset explorations and practical use cases in healthcare, multimedia, and social networks, concluding with future directions toward deep learning integration, adaptive weighting, scalability, and interpretability. Overall, it serves as a comprehensive, practitioner-friendly roadmap for understanding, applying, and advancing multi-view clustering across diverse domains.

Abstract

Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the complexity of processing large datasets from different domains, sources or views. In this context, multi-view clustering (MVC), a class of unsupervised multi-view learning, emerges as a powerful approach to overcome these challenges. MVC compensates for the shortcomings of single-view methods and provides a richer data representation and effective solutions for a variety of unsupervised learning tasks. In contrast to traditional single-view approaches, the semantically rich nature of multi-view data increases its practical utility despite its inherent complexity. This survey makes a threefold contribution: (1) a systematic categorization of multi-view clustering methods into well-defined groups, including co-training, co-regularization, subspace, deep learning, kernel-based, anchor-based, and graph-based strategies; (2) an in-depth analysis of their respective strengths, weaknesses, and practical challenges, such as scalability and incomplete data; and (3) a forward-looking discussion of emerging trends, interdisciplinary applications, and future directions in MVC research. This study represents an extensive workload, encompassing the review of over 140 foundational and recent publications, the development of comparative insights on integration strategies such as early fusion, late fusion, and joint learning, and the structured investigation of practical use cases in the areas of healthcare, multimedia, and social network analysis. By integrating these efforts, this work aims to fill existing gaps in MVC research and provide actionable insights for the advancement of the field.

Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques

TL;DR

This survey systematically catalogs multi-view clustering methods, organizing them into taxonomy-driven categories (co-training, co-regularization, kernel-based, subspace, deep learning, graph-based, and anchor-based) and addressing challenges such as heterogeneity and incomplete data. It covers classical and graph-based MVC, methods that handle missing views, and a formal review of representative approaches with unified objective frameworks, including MVCSK, CI-GMVC, CNESE, HMvC, MSGL, FPMVS-CAG, OMSC, AMVSCGL, SMSC, and MCGLSR. The paper also provides dataset explorations and practical use cases in healthcare, multimedia, and social networks, concluding with future directions toward deep learning integration, adaptive weighting, scalability, and interpretability. Overall, it serves as a comprehensive, practitioner-friendly roadmap for understanding, applying, and advancing multi-view clustering across diverse domains.

Abstract

Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the complexity of processing large datasets from different domains, sources or views. In this context, multi-view clustering (MVC), a class of unsupervised multi-view learning, emerges as a powerful approach to overcome these challenges. MVC compensates for the shortcomings of single-view methods and provides a richer data representation and effective solutions for a variety of unsupervised learning tasks. In contrast to traditional single-view approaches, the semantically rich nature of multi-view data increases its practical utility despite its inherent complexity. This survey makes a threefold contribution: (1) a systematic categorization of multi-view clustering methods into well-defined groups, including co-training, co-regularization, subspace, deep learning, kernel-based, anchor-based, and graph-based strategies; (2) an in-depth analysis of their respective strengths, weaknesses, and practical challenges, such as scalability and incomplete data; and (3) a forward-looking discussion of emerging trends, interdisciplinary applications, and future directions in MVC research. This study represents an extensive workload, encompassing the review of over 140 foundational and recent publications, the development of comparative insights on integration strategies such as early fusion, late fusion, and joint learning, and the structured investigation of practical use cases in the areas of healthcare, multimedia, and social network analysis. By integrating these efforts, this work aims to fill existing gaps in MVC research and provide actionable insights for the advancement of the field.

Paper Structure

This paper contains 28 sections, 11 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: A graphic illustration of the co-training scheme with several views. The matrices $\{\hbox{\bf Z}^1,...,\hbox{\bf Z}^m\}$ refer to the different data matrices corresponding to each view and $\{\hbox{\bf L}^1,...,\hbox{\bf L}^m\}$ are the co-trained models. In this scheme, the information obtained from the individual views is systematically refined and iteratively exchanged, which promotes joint learning from the different views.
  • Figure 2: The flow chart of the Multi-view Spectral Clustering Chen2022. First, similarity matrices of the individual views ($\hbox{\bf Z}$) are created using the K-nearest neighbor algorithm. Then, all views are combined into a unified matrix $\hbox{\bf P}$ using a weighted fusion operation. This unified matrix $\hbox{\bf P}$ is used to update the similarity matrix for each view. The algorithm iteratively refines the similarity matrices obtained in the previous step. It integrates the spectral clustering algorithm and the symmetric non-negative matrix factorization algorithm to generate the non-negative embedding matrix $\hbox{\bf M}$, which leads directly to the clustering results.
  • Figure 3: The flowchart of kernel-based multi-view spectral clustering. For multi-view data, we denote ($\hbox{\bf Z}^1,\hbox{\bf Z}^2,...,\hbox{\bf Z}^m$) as the data matrix with $m$ views. Then an individual kernel is constructed for each view so that we have ($\hbox{\bf K}^1, \hbox{\bf K}^2,...,\hbox{\bf K}^m$). A specific clustering method is then selected to cluster the data based on the unified kernel $\hbox{\bf K}$.
  • Figure 4: The flowchart of graph-based multi-view clustering, adapted from Wang-Hao2020. In this scheme, the data matrix of each view is converted into a graph matrix, followed by applying a fusion method across all views to create a unified graph.