Table of Contents
Fetching ...

Robust Principal Component Analysis via Discriminant Sample Weight Learning

Yingzhuo Deng, Ke Hu, Bo Li, Yao Zhang

TL;DR

This paper tackles the sensitivity of PCA to outliers by introducing RPCA-DSWL, a framework that jointly learns sample weights, the data mean, and the PCA projection. It differentiates outliers via three category-specific weights corresponding to the principal subspace, its orthogonal complement, and their intersection, which are merged into a single robust weighting $w_i$. The mean is estimated as the weighted center $m = Xw$ and the projection via the top eigenvectors of $XHX^T$ with $H = \mathrm{diag}(w) - ww^T$, enabling an efficient coordinate-descent optimization with overall complexity $O(n d^2)$. Empirical results on toy data, 10 UCI datasets, and face images show that RPCA-DSWL yields more discriminative weights, more accurate means and principal components, and improved reconstruction and classification performance under contamination. This approach advances robust PCA by explicitly modeling and exploiting discriminant outlier structure across multiple subspaces.

Abstract

Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, and the proposed algorithm iteratively learns the weights, the mean, and the projection matrix, respectively. Specifically, when the mean and the projection matrix are available, via fine-grained analysis of outliers, a weight for each sample is learned hierarchically so that outliers have small weights while normal samples have large weights. With the learned weights available, a weighted optimization problem is solved to estimate both the data mean and the projection matrix. Because the learned weights discriminate outliers from normal samples, the adverse influence of outliers is mitigated due to the corresponding small weights. Experiments on toy data, UCI dataset, and face dataset demonstrate the effectiveness of the proposed method in estimating the mean and the projection matrix from the data containing outliers.

Robust Principal Component Analysis via Discriminant Sample Weight Learning

TL;DR

This paper tackles the sensitivity of PCA to outliers by introducing RPCA-DSWL, a framework that jointly learns sample weights, the data mean, and the PCA projection. It differentiates outliers via three category-specific weights corresponding to the principal subspace, its orthogonal complement, and their intersection, which are merged into a single robust weighting . The mean is estimated as the weighted center and the projection via the top eigenvectors of with , enabling an efficient coordinate-descent optimization with overall complexity . Empirical results on toy data, 10 UCI datasets, and face images show that RPCA-DSWL yields more discriminative weights, more accurate means and principal components, and improved reconstruction and classification performance under contamination. This approach advances robust PCA by explicitly modeling and exploiting discriminant outlier structure across multiple subspaces.

Abstract

Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, and the proposed algorithm iteratively learns the weights, the mean, and the projection matrix, respectively. Specifically, when the mean and the projection matrix are available, via fine-grained analysis of outliers, a weight for each sample is learned hierarchically so that outliers have small weights while normal samples have large weights. With the learned weights available, a weighted optimization problem is solved to estimate both the data mean and the projection matrix. Because the learned weights discriminate outliers from normal samples, the adverse influence of outliers is mitigated due to the corresponding small weights. Experiments on toy data, UCI dataset, and face dataset demonstrate the effectiveness of the proposed method in estimating the mean and the projection matrix from the data containing outliers.
Paper Structure (11 sections, 41 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 41 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of three categories of outliers. $\textup{Poin1}$, $\textup{Point2}$ and $\textup{Point3}$ represent the first, the second and the third category of outliers, respectively. PCS represents Principal Component Subspace. OCS represents Orthogonal Complement Subspace, which is orthogonal to PCS.
  • Figure 2: The principal components extracted by six different algorithms from artificial toy datasets containing different types of outliers. $\color{gray}\circ$ represents normal sample. $\color{gray}\times$ represents outlier. $\color{black}\star$ represents the mean given by the RPCA-OM algorithm. $\color{red}\star$ represents the mean given by the RPCA-DSWL algorithm.
  • Figure 3: From left to right, sample weights distributions produced by $L_{2,p}$-PCA, RPCA-OM, RPCA-DI, and RPCA-DSWL, respectively. For visualization purposes, the vertical axis values are in logarithmic scale.
  • Figure 4: Visual representation of sample weights produced by $L_{2,p}$-PCA, RPCA-OM, RPCA-DI, RPCA-DSWL (from left to right).
  • Figure 5: Sampled images from different datasets. From top to bottom, the datasets sampled are Yale, Extended Yale B, ORL, and Umist, respectively. Images in the red box are with random black and white blocks, representing manually created outlier images.
  • ...and 3 more figures