Multi-Subspace Matrix Recovery from Permuted Data
Liangqi Xie, Jicong Fan
TL;DR
This work addresses the problem of recovering a multi-subspace data matrix from permuted columns by introducing a four-stage PMSDR pipeline that combines outlier detection, subspace reconstruction, outlier classification, and matrix recovery. It provides theoretical guarantees for the outlier classification step and demonstrates strong empirical performance across synthetic data, face images, motion sequences, and data-reidentification scenarios, outperforming state-of-the-art single-subspace approaches and augmented baselines. By handling multi-subspace structure and partial permutations, PMSDR broadens applicability to data cleaning, integration, and privacy-related data restoration in high-dimensional settings. The approach offers a flexible framework that can incorporate alternative subspace methods and recovery strategies, with potential for extensions to non-linear or fully shuffled regimes and integration with preprocessing steps like MRUC.
Abstract
This paper aims to recover a multi-subspace matrix from permuted data: given a matrix, in which the columns are drawn from a union of low-dimensional subspaces and some columns are corrupted by permutations on their entries, recover the original matrix. The task has numerous practical applications such as data cleaning, integration, and de-anonymization, but it remains challenging and cannot be well addressed by existing techniques such as robust principal component analysis because of the presence of multiple subspaces and the permutations on the elements of vectors. To solve the challenge, we develop a novel four-stage algorithm pipeline including outlier identification, subspace reconstruction, outlier classification, and unsupervised sensing for permuted vector recovery. Particularly, we provide theoretical guarantees for the outlier classification step, ensuring reliable multi-subspace matrix recovery. Our pipeline is compared with state-of-the-art competitors on multiple benchmarks and shows superior performance.
