Unsupervised Feature Selection Through Group Discovery
Shira Lifshitz, Ofir Lindenbaum, Gal Mishne, Ron Meir, Hadas Benisty
TL;DR
GroupFS tackles unsupervised feature selection when signals aggregate in latent feature groups. It jointly learns group structure and selects informative groups through an end-to-end differentiable framework that enforces Laplacian smoothness on both sample and feature graphs, and applies a group-sparsity regularizer. By leveraging a Gumbel-Softmax derived assignment and stochastic gates, GroupFS discovers latent groups without supervision and achieves competitive or superior clustering accuracy across nine benchmarks, with interpretable, domain-aligned groupings. Limitations include reliance on Euclidean distances for graph construction and a single global notion of group importance; future work includes manifold-aware distances and time- or condition-adaptive grouping.
Abstract
Unsupervised feature selection (FS) is essential for high-dimensional learning tasks where labels are not available. It helps reduce noise, improve generalization, and enhance interpretability. However, most existing unsupervised FS methods evaluate features in isolation, even though informative signals often emerge from groups of related features. For example, adjacent pixels, functionally connected brain regions, or correlated financial indicators tend to act together, making independent evaluation suboptimal. Although some methods attempt to capture group structure, they typically rely on predefined partitions or label supervision, limiting their applicability. We propose GroupFS, an end-to-end, fully differentiable framework that jointly discovers latent feature groups and selects the most informative groups among them, without relying on fixed a priori groups or label supervision. GroupFS enforces Laplacian smoothness on both feature and sample graphs and applies a group sparsity regularizer to learn a compact, structured representation. Across nine benchmarks spanning images, tabular data, and biological datasets, GroupFS consistently outperforms state-of-the-art unsupervised FS in clustering and selects groups of features that align with meaningful patterns.
