Table of Contents
Fetching ...

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

Yiding Lu, Yijie Lin, Mouxing Yang, Dezhong Peng, Peng Hu, Xi Peng

TL;DR

DIVIDE tackles false negatives and false positives in robust contrastive multi-view clustering by combining a global, high-order affinity mechanism with a decoupled cross-view learning architecture. It uses multi-step random walks on an affinity graph to identify high-order negatives and out-of-neighborhood positives, rectifying the contrastive targets with $ \mathbf{T}^{(v)} = \alpha \mathbf{I}_n + (1-\alpha)(\mathbf{M}^{(v)})^{t}$. The decoupled framework preserves view-specific information with view-specific Siamese encoders and cross-view decoders while performing intra- and inter-view learning in separate embedding spaces. Extensive experiments on four benchmarks in complete and incomplete MvC settings demonstrate state-of-the-art performance and robustness to missing views, validating the effectiveness of FN rectification and decoupled learning for real-world multi-view clustering.

Abstract

In recent, some robust contrastive multi-view clustering (MvC) methods have been proposed, which construct data pairs from neighborhoods to alleviate the false negative issue, i.e., some intra-cluster samples are wrongly treated as negative pairs. Although promising performance has been achieved by these methods, the false negative issue is still far from addressed and the false positive issue emerges because all in- and out-of-neighborhood samples are simply treated as positive and negative, respectively. To address the issues, we propose a novel robust method, dubbed decoupled contrastive multi-view clustering with high-order random walks (DIVIDE). In brief, DIVIDE leverages random walks to progressively identify data pairs in a global instead of local manner. As a result, DIVIDE could identify in-neighborhood negatives and out-of-neighborhood positives. Moreover, DIVIDE embraces a novel MvC architecture to perform inter- and intra-view contrastive learning in different embedding spaces, thus boosting clustering performance and embracing the robustness against missing views. To verify the efficacy of DIVIDE, we carry out extensive experiments on four benchmark datasets comparing with nine state-of-the-art MvC methods in both complete and incomplete MvC settings.

Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

TL;DR

DIVIDE tackles false negatives and false positives in robust contrastive multi-view clustering by combining a global, high-order affinity mechanism with a decoupled cross-view learning architecture. It uses multi-step random walks on an affinity graph to identify high-order negatives and out-of-neighborhood positives, rectifying the contrastive targets with . The decoupled framework preserves view-specific information with view-specific Siamese encoders and cross-view decoders while performing intra- and inter-view learning in separate embedding spaces. Extensive experiments on four benchmarks in complete and incomplete MvC settings demonstrate state-of-the-art performance and robustness to missing views, validating the effectiveness of FN rectification and decoupled learning for real-world multi-view clustering.

Abstract

In recent, some robust contrastive multi-view clustering (MvC) methods have been proposed, which construct data pairs from neighborhoods to alleviate the false negative issue, i.e., some intra-cluster samples are wrongly treated as negative pairs. Although promising performance has been achieved by these methods, the false negative issue is still far from addressed and the false positive issue emerges because all in- and out-of-neighborhood samples are simply treated as positive and negative, respectively. To address the issues, we propose a novel robust method, dubbed decoupled contrastive multi-view clustering with high-order random walks (DIVIDE). In brief, DIVIDE leverages random walks to progressively identify data pairs in a global instead of local manner. As a result, DIVIDE could identify in-neighborhood negatives and out-of-neighborhood positives. Moreover, DIVIDE embraces a novel MvC architecture to perform inter- and intra-view contrastive learning in different embedding spaces, thus boosting clustering performance and embracing the robustness against missing views. To verify the efficacy of DIVIDE, we carry out extensive experiments on four benchmark datasets comparing with nine state-of-the-art MvC methods in both complete and incomplete MvC settings.
Paper Structure (28 sections, 9 equations, 11 figures, 7 tables)

This paper contains 28 sections, 9 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Our observation and basic idea. As illustrated, the existing robust contrastive MvC methods construct negative pairs resorting to $\epsilon$- or $k$-nearest-neighborhood approaches, which will treat all in-neighborhood samples as positive and out-of-neighborhood samples as negative, resulting in the false positive and false negative issues. Different from these methods, our method identifies the data pairs in a global instead of local manner by performing multi-step random walks on an affinity graph. Thanks to the globality, on the one hand, our method will treat some in-neighborhood samples as negative if high-order neighbors are with a lower affinity score, thus avoiding the construction of FPs to some extent. On the other hand, our method will treat some out-of-neighborhood samples as positive, thus avoiding the construction of FNs in part.
  • Figure 2: Overview of DIVIDE. For clarity, we use $V=2$ as an example for illustration. DIVIDE consists of two modules: decoupled contrastive learning framework (the left part) and random-walk-based FNs correction (the right part). Left: data of each view is first passed into the view-specific Siamese encoders to obtain the dual-embedding $\mathbf{z}^{(v)}_{q}$ and $\mathbf{z}^{(v)}_{k}$. After that, the cross-view decoder projects $\mathbf{z}^{(v)}_{q}$ into a new subspace. Finally, the decoupled contrastive losses are applied in two subspaces to learn cross-view consistency and preserve view-specific information; Right: The affinity graph is constructed from each data batch and the corresponding adjacency matrix is normalized as the initial transition matrix. After that, DIVIDE performs a random walk for each anchor by choosing a neighbor to move according to the affinity. By adopting $t$-step random walks for all anchors, DIVIDE progressively identifies the corresponding high-order neighbors (i.e., false negatives) with a probability matrix which is formulated as a $t$-step transition matrix $\mathbf{M}^{t}$. Finally, DIVIDE uses $\mathbf{M}^{t}$ to rectify the target of decoupled contrastive losses so that the false negative issue can be addressed.
  • Figure 3: Clustering performance under different missing rates on Scene-15. The colored regions denote the standard variances with five random experiments.
  • Figure 4: Scene-15
  • Figure 5: Caltech-101
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 1: Transition Matrix