Table of Contents
Fetching ...

Fast Asymmetric Factorization for Large Scale Multiple Kernel Clustering

Yan Chen, Liang Du, Lei Duan

TL;DR

The paper addresses large-scale multiple kernel clustering by tackling memory and time bottlenecks inherent to dense kernel matrices. It introduces EMKCF, which builds a sparse kernel per base kernel via local regression and learns a shared consensus representation $U$ along with kernel-specific factors $H_r$ using a weighted, orthogonal-constrained factorization. Optimization proceeds via a block-coordinate-descent scheme with an SVD-based update for $H_r$, a closed-form, monotone update for $oldsymbol{\mu}$, and a multiplicative update for $U$, achieving linear memory usage in $n$ (ignoring sorting). Experiments on seven datasets, including $MNISTLarge$ and $EMNIST$, show that EMKCF delivers state-of-the-art clustering accuracy and NMI while significantly reducing memory and runtime, demonstrating strong scalability for large-scale MKC.

Abstract

Kernel methods are extensively employed for nonlinear data clustering, yet their effectiveness heavily relies on selecting suitable kernels and associated parameters, posing challenges in advance determination. In response, Multiple Kernel Clustering (MKC) has emerged as a solution, allowing the fusion of information from multiple base kernels for clustering. However, both early fusion and late fusion methods for large-scale MKC encounter challenges in memory and time constraints, necessitating simultaneous optimization of both aspects. To address this issue, we propose Efficient Multiple Kernel Concept Factorization (EMKCF), which constructs a new sparse kernel matrix inspired by local regression to achieve memory efficiency. EMKCF learns consensus and individual representations by extending orthogonal concept factorization to handle multiple kernels for time efficiency. Experimental results demonstrate the efficiency and effectiveness of EMKCF on benchmark datasets compared to state-of-the-art methods. The proposed method offers a straightforward, scalable, and effective solution for large-scale MKC tasks.

Fast Asymmetric Factorization for Large Scale Multiple Kernel Clustering

TL;DR

The paper addresses large-scale multiple kernel clustering by tackling memory and time bottlenecks inherent to dense kernel matrices. It introduces EMKCF, which builds a sparse kernel per base kernel via local regression and learns a shared consensus representation along with kernel-specific factors using a weighted, orthogonal-constrained factorization. Optimization proceeds via a block-coordinate-descent scheme with an SVD-based update for , a closed-form, monotone update for , and a multiplicative update for , achieving linear memory usage in (ignoring sorting). Experiments on seven datasets, including and , show that EMKCF delivers state-of-the-art clustering accuracy and NMI while significantly reducing memory and runtime, demonstrating strong scalability for large-scale MKC.

Abstract

Kernel methods are extensively employed for nonlinear data clustering, yet their effectiveness heavily relies on selecting suitable kernels and associated parameters, posing challenges in advance determination. In response, Multiple Kernel Clustering (MKC) has emerged as a solution, allowing the fusion of information from multiple base kernels for clustering. However, both early fusion and late fusion methods for large-scale MKC encounter challenges in memory and time constraints, necessitating simultaneous optimization of both aspects. To address this issue, we propose Efficient Multiple Kernel Concept Factorization (EMKCF), which constructs a new sparse kernel matrix inspired by local regression to achieve memory efficiency. EMKCF learns consensus and individual representations by extending orthogonal concept factorization to handle multiple kernels for time efficiency. Experimental results demonstrate the efficiency and effectiveness of EMKCF on benchmark datasets compared to state-of-the-art methods. The proposed method offers a straightforward, scalable, and effective solution for large-scale MKC tasks.
Paper Structure (16 sections, 11 equations, 1 figure, 3 tables, 1 algorithm)