Table of Contents
Fetching ...

Learnable Similarity and Dissimilarity Guided Symmetric Non-Negative Matrix Factorization

Wenlong Lyu, Yuheng Jia

TL;DR

This paper tackles the sensitivity of SymNMF to the choice of $k$ in $k$-NN graph construction by learning a weighted, low-dimensional combination of $k$-NN slices to form a data-driven similarity $S(w)$ and introducing a dual dissimilarity $D(p)$ to enhance discriminability. It also introduces a novel orthogonality regularization $\mathcal{R}(V)$ with column-wise updates and convergence guarantees under the PHALS framework, enabling a provably convergent alternating optimization. The approach achieves superior clustering performance across eight datasets compared to both fixed and adaptive similarity methods, and provides clear insights into the learned coefficients $w$ and $p$, which adaptively select reliable neighbors and farthest relations. The work offers practical significance by reducing the search space to $n-1$ dimensions, improving robustness to misleading neighbor relations, and delivering an efficient algorithm with convergence guarantees for symmetric NMF-based clustering tasks.

Abstract

Symmetric nonnegative matrix factorization (SymNMF) is a powerful tool for clustering, which typically uses the $k$-nearest neighbor ($k$-NN) method to construct similarity matrix. However, $k$-NN may mislead clustering since the neighbors may belong to different clusters, and its reliability generally decreases as $k$ grows. In this paper, we construct the similarity matrix as a weighted $k$-NN graph with learnable weight that reflects the reliability of each $k$-th NN. This approach reduces the search space of the similarity matrix learning to $n - 1$ dimension, as opposed to the $\mathcal{O}(n^2)$ dimension of existing methods, where $n$ represents the number of samples. Moreover, to obtain a discriminative similarity matrix, we introduce a dissimilarity matrix with a dual structure of the similarity matrix, and propose a new form of orthogonality regularization with discussions on its geometric interpretation and numerical stability. An efficient alternative optimization algorithm is designed to solve the proposed model, with theoretically guarantee that the variables converge to a stationary point that satisfies the KKT conditions. The advantage of the proposed model is demonstrated by the comparison with nine state-of-the-art clustering methods on eight datasets. The code is available at \url{https://github.com/lwl-learning/LSDGSymNMF}.

Learnable Similarity and Dissimilarity Guided Symmetric Non-Negative Matrix Factorization

TL;DR

This paper tackles the sensitivity of SymNMF to the choice of in -NN graph construction by learning a weighted, low-dimensional combination of -NN slices to form a data-driven similarity and introducing a dual dissimilarity to enhance discriminability. It also introduces a novel orthogonality regularization with column-wise updates and convergence guarantees under the PHALS framework, enabling a provably convergent alternating optimization. The approach achieves superior clustering performance across eight datasets compared to both fixed and adaptive similarity methods, and provides clear insights into the learned coefficients and , which adaptively select reliable neighbors and farthest relations. The work offers practical significance by reducing the search space to dimensions, improving robustness to misleading neighbor relations, and delivering an efficient algorithm with convergence guarantees for symmetric NMF-based clustering tasks.

Abstract

Symmetric nonnegative matrix factorization (SymNMF) is a powerful tool for clustering, which typically uses the -nearest neighbor (-NN) method to construct similarity matrix. However, -NN may mislead clustering since the neighbors may belong to different clusters, and its reliability generally decreases as grows. In this paper, we construct the similarity matrix as a weighted -NN graph with learnable weight that reflects the reliability of each -th NN. This approach reduces the search space of the similarity matrix learning to dimension, as opposed to the dimension of existing methods, where represents the number of samples. Moreover, to obtain a discriminative similarity matrix, we introduce a dissimilarity matrix with a dual structure of the similarity matrix, and propose a new form of orthogonality regularization with discussions on its geometric interpretation and numerical stability. An efficient alternative optimization algorithm is designed to solve the proposed model, with theoretically guarantee that the variables converge to a stationary point that satisfies the KKT conditions. The advantage of the proposed model is demonstrated by the comparison with nine state-of-the-art clustering methods on eight datasets. The code is available at \url{https://github.com/lwl-learning/LSDGSymNMF}.

Paper Structure

This paper contains 22 sections, 6 theorems, 54 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Let $V_{-j} = \left[v_1, \cdots, v_{j-1}, v_{j+1}, \cdots, v_{r} \right]$, $I_n$ represents the identity matrix of size $n$, $V^{\dag} \in \mathbb{R}^{r \times n}$ represents the Moore-Penrose pseudo-inverse of $V$. Assuming that $V \in \mathbb{R}^{n \times r}$ and $\mathrm{rank}(V) = r$, then $\for

Figures (6)

  • Figure 1: (a) Correct rate of each $k$-th NN slice $A^{(k)}$. (b) Clustering ACC trained by standard SymNMF Kuang2015SymNMF on the ORL dataset with respect to $k$, where the kernel function $\kappa(x_i, x_j)$ is defined by the self-tuning method ZelnikManor2004SelfTuningSC.
  • Figure 2: The learned $w^{\ast}$ in \ref{['eq:toy_model']} (left $y$-axis) and the correct rate (right $y$-axis). It can be seen that $w^{\ast}$ is sparse and consistent with correct rate.
  • Figure 3: The geometric meaning of $\mathcal{R}(v_3)$, which can be seen as the square of distance between $v_3$ and the plane spanned by $\{v_1, v_2 \}$.
  • Figure 4: Correct rate and the learned $w$ and $p$ of each dataset. The $x$-axis represents the $k$-th nearest neighbors on a logarithmic scale. The upper and lower parts of the left $y$-axis represents the coordinates of $w$ and $p$ respectively, and the right y-axis is the coordinate of correct rate. For better view, $w$ and $p$ are normalized to the range $[0, 1]$.
  • Figure 5: Average values of ACC of the proposed model with different values of $\alpha$, $\beta$ and $\mu$. They are all 4-D figures, where the fourth direction is indicated by the color with the corresponding color bar.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Proposition 1
  • proof
  • Remark 1
  • Lemma 1: Theorem 2, Hou2023APrgressiveHA
  • Remark 2
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 1
  • ...and 3 more