How to characterize imprecision in multi-view clustering?

Jinyi Xu; Zuowei Zhang; Ze Lin; Yixiang Chen; Zhe Liu; Weiping Ding

How to characterize imprecision in multi-view clustering?

Jinyi Xu, Zuowei Zhang, Ze Lin, Yixiang Chen, Zhe Liu, Weiping Ding

TL;DR

This work tackles imprecision in multi-view clustering by extending credal partitions to the multi-view setting, enabling meta-clusters to capture overlapping region uncertainty. It develops MvLRECM, which jointly optimizes masses across views under an entropy-weighted fusion and a low-rank regularization, using the objective $J_{MvLRECM}$ that includes a nuclear-norm term $||\boldsymbol{Z}_i||_*$ and a low-rank surrogate $\boldsymbol{M}_i \approx \boldsymbol{Z}_i$. Optimization proceeds via alternating updates: centers $\boldsymbol{V}$ from a linear system $\boldsymbol{H}^q \cdot \boldsymbol{V}^q = \boldsymbol{B}^q$, weights $\boldsymbol{w}$ from a Lagrangian, masses $\boldsymbol{M}$ with closed-form expressions, and the low-rank proxy $\boldsymbol{Z}$ through nuclear-norm minimization. Empirical results on toy data and six real-world UCI datasets show that MvLRECM improves accuracy (ACC, Purity, F-score, RI) and reduces imprecision (IR) relative to state-of-the-art baselines, while clearly illustrating the benefits and limitations of credal, uncertainty-aware multi-view clustering.

Abstract

It is still challenging to cluster multi-view data since existing methods can only assign an object to a specific (singleton) cluster when combining different view information. As a result, it fails to characterize imprecision of objects in overlapping regions of different clusters, thus leading to a high risk of errors. In this paper, we thereby want to answer the question: how to characterize imprecision in multi-view clustering? Correspondingly, we propose a multi-view low-rank evidential c-means based on entropy constraint (MvLRECM). The proposed MvLRECM can be considered as a multi-view version of evidential c-means based on the theory of belief functions. In MvLRECM, each object is allowed to belong to different clusters with various degrees of support (masses of belief) to characterize uncertainty when decision-making. Moreover, if an object is in the overlapping region of several singleton clusters, it can be assigned to a meta-cluster, defined as the union of these singleton clusters, to characterize the local imprecision in the result. In addition, entropy-weighting and low-rank constraints are employed to reduce imprecision and improve accuracy. Compared to state-of-the-art methods, the effectiveness of MvLRECM is demonstrated based on several toy and UCI real datasets.

How to characterize imprecision in multi-view clustering?

TL;DR

that includes a nuclear-norm term

and a low-rank surrogate

. Optimization proceeds via alternating updates: centers

from a linear system

, weights

from a Lagrangian, masses

with closed-form expressions, and the low-rank proxy

through nuclear-norm minimization. Empirical results on toy data and six real-world UCI datasets show that MvLRECM improves accuracy (ACC, Purity, F-score, RI) and reduces imprecision (IR) relative to state-of-the-art baselines, while clearly illustrating the benefits and limitations of credal, uncertainty-aware multi-view clustering.

Abstract

Paper Structure (17 sections, 47 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 47 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Background
Basics of belief functions
Low-rank technique
MvLRECM
Model
Optimization
Update V
Update w
Update M
Update Z
EXPERIMENTS
Metrics Study
Running example on toy dataset
Experiments on real-world dataset
...and 2 more sections

Figures (4)

Figure 1: Illustration of imprecision in clustering. The red objects in the overlapping regions are indistinguishable.
Figure 2: Results on the 3DBall dataset, (a) the original 3D distribution, (b) the original xy-view, (c) the masses distribution of MvLREVM ($\alpha=2$), (d),(g) MvLREVM ($\alpha=1$) , (e),(h) MvLREVM ($\alpha=2$), (f),(i) MvLREVM ($\alpha=3$).
Figure 3: Average clustering performance on the real-world datasets.
Figure 4: How Parameter $\theta$ and $\eta$ affect the performance of MvLRECM on the Iris dataset, (a)ACC, (b)NMI, (c)Purity, (d)F-score, (e)Precision, (f)Recall, (g)RI, (h)IR, where $x$ and $y$ axis are $\theta$ and $\eta$.

How to characterize imprecision in multi-view clustering?

TL;DR

Abstract

How to characterize imprecision in multi-view clustering?

Authors

TL;DR

Abstract

Table of Contents

Figures (4)