CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection
Yanyong Huang, Yuxin Cai, Dongjie Wang, Xiuwen Yi, Tianrui Li
TL;DR
CONDEN-FI tackles the problem of unsupervised joint selection of features and instances across multi-view unlabeled data by learning inter-view consistent and view-specific representations to reconstruct the original data in a reduced space. It models the reconstruction via $X^{(v)} \approx X^{(v)} (B + B^{(v)})$ with $B$ capturing shared information and $B^{(v)}$ capturing view-specific information, and imposes an adaptive view-consensus similarity graph $S$ with weights $\eta^{(v)}$ and $\gamma^{(v)}$ to promote diversity. An alternating optimization algorithm updates $W^{(v)}$, $B$, $B^{(v)}$, $S$, and $\Psi^{(v)}$ with closed-form updates, and a novel MvIS measure guides instance ranking. Experiments on eight real-world multi-view datasets show CONDEN-FI consistently outperforms state-of-the-art single-view and multi-view co-selection baselines in ACC and F1, validating the value of jointly leveraging common and view-specific information and adaptive graph-based diversity. The approach provides a scalable preprocessing framework for downstream tasks and suggests avenues for extensions to supervised or semi-supervised settings.
Abstract
The objective of multi-view unsupervised feature and instance co-selection is to simultaneously iden-tify the most representative features and samples from multi-view unlabeled data, which aids in mit-igating the curse of dimensionality and reducing instance size to improve the performance of down-stream tasks. However, existing methods treat feature selection and instance selection as two separate processes, failing to leverage the potential interactions between the feature and instance spaces. Addi-tionally, previous co-selection methods for multi-view data require concatenating different views, which overlooks the consistent information among them. In this paper, we propose a CONsistency and DivErsity learNing-based multi-view unsupervised Feature and Instance co-selection (CONDEN-FI) to address the above-mentioned issues. Specifically, CONDEN-FI reconstructs mul-ti-view data from both the sample and feature spaces to learn representations that are consistent across views and specific to each view, enabling the simultaneous selection of the most important features and instances. Moreover, CONDEN-FI adaptively learns a view-consensus similarity graph to help select both dissimilar and similar samples in the reconstructed data space, leading to a more diverse selection of instances. An efficient algorithm is developed to solve the resultant optimization problem, and the comprehensive experimental results on real-world datasets demonstrate that CONDEN-FI is effective compared to state-of-the-art methods.
