Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation
Tong Wu
TL;DR
This work introduces OR-TLRR, a robust tensor low-rank representation method for clustering 3-way tensor data in the presence of outliers. By formulating a convex optimization with a tensor nuclear norm and an $\ell_{2,1}$-norm penalty, and solving it under a transform-based t-SVD framework, the method simultaneously recovers the clean row space and detects column-sparse outliers; it also extends to incomplete data via OR-TLRR-EWZF. The authors prove exact recovery guarantees under mild incoherence and unambiguity conditions and demonstrate strong empirical performance on synthetic and real datasets, including scenarios with missing data. The approach highlights the benefits of tensor-centric, transform-aware representations for robust subspace clustering and outlier detection with practical applicability to image and video data, aided by publicly available code.
Abstract
Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.
