Table of Contents
Fetching ...

Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation

Tong Wu

TL;DR

This work introduces OR-TLRR, a robust tensor low-rank representation method for clustering 3-way tensor data in the presence of outliers. By formulating a convex optimization with a tensor nuclear norm and an $\ell_{2,1}$-norm penalty, and solving it under a transform-based t-SVD framework, the method simultaneously recovers the clean row space and detects column-sparse outliers; it also extends to incomplete data via OR-TLRR-EWZF. The authors prove exact recovery guarantees under mild incoherence and unambiguity conditions and demonstrate strong empirical performance on synthetic and real datasets, including scenarios with missing data. The approach highlights the benefits of tensor-centric, transform-aware representations for robust subspace clustering and outlier detection with practical applicability to image and video data, aided by publicly available code.

Abstract

Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.

Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation

TL;DR

This work introduces OR-TLRR, a robust tensor low-rank representation method for clustering 3-way tensor data in the presence of outliers. By formulating a convex optimization with a tensor nuclear norm and an -norm penalty, and solving it under a transform-based t-SVD framework, the method simultaneously recovers the clean row space and detects column-sparse outliers; it also extends to incomplete data via OR-TLRR-EWZF. The authors prove exact recovery guarantees under mild incoherence and unambiguity conditions and demonstrate strong empirical performance on synthetic and real datasets, including scenarios with missing data. The approach highlights the benefits of tensor-centric, transform-aware representations for robust subspace clustering and outlier detection with practical applicability to image and video data, aided by publicly available code.

Abstract

Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.
Paper Structure (36 sections, 11 theorems, 85 equations, 1 figure, 12 tables, 2 algorithms)

This paper contains 36 sections, 11 theorems, 85 equations, 1 figure, 12 tables, 2 algorithms.

Key Result

Theorem 1

If eqn:linearrep holds, then there exist two tensors $\boldsymbol{\mathcal{A}} \in \mathbb{R}^{n_1 \times p \times n_3}$ and $\boldsymbol{\mathcal{Z}} \in \mathbb{C}^{p \times n_2 \times n_3}$ such that where $\boldsymbol{\mathcal{A}}$ can be constructed from $\mathbf{A}$ by setting $\boldsymbol{\mathcal{A}}_{(j)} = \mathtt{ivec} (\mathbf{a}_j)$ and $\boldsymbol{\mathcal{Z}}$ can be computed by $

Figures (1)

  • Figure 1: Sample images from the datasets.

Theorems & Definitions (28)

  • Definition 1: t-product KernfeldKA.LAA2015
  • Definition 2: t-SVD SongNZ.NLAA2020
  • Definition 3: Tensor tubal rank SongNZ.NLAA2020
  • Definition 4: Tensor nuclear norm SongNZ.NLAA2020
  • Definition 5: Tensor subspace ZhouLFLY.PAMI2021
  • Definition 6: Tensor column space ZhouF.CVPR2017
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • ...and 18 more