Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Louis Ohl; Pierre-Alexandre Mattei; Charles Bouveyron; Mickaël Leclercq; Arnaud Droit; Frédéric Precioso

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso

TL;DR

Sparse GEMINI tackles the problem of joint feature selection and discriminative clustering in high-dimensional data by coupling the geometry-aware GEMINI objective with sparsity-inducing penalties. It supports both linear (logistic) and neural (LassoNet) architectures, enabling end-to-end training through proximal gradients and explicit GEMINI gradients. The method demonstrates competitive clustering performance (ARI) while delivering improved variable selection (VSER/CVR) on synthetic and real datasets, including MNIST variants and a large Prostate-BCR transcriptomics dataset. A public GemClus package provides exact gradient computations for reproducibility and broader adoption of the approach.

Abstract

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on the data distribution, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple l1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. We demonstrate the performances of Sparse GEMINI on synthetic datasets and large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

TL;DR

Abstract

Paper Structure (50 sections, 66 equations, 12 figures, 7 tables)

This paper contains 50 sections, 66 equations, 12 figures, 7 tables.

Introduction
Related works
The Sparse GEMINI
The GEMINI objective
Sparse models
Unsupervised logistic regression architecture
The LassoNet architecture
Optimisation
Training and model selection
Gradient considerations
Proximal gradients
GEMINI gradients
Implementations
Experiments
Metrics
...and 35 more sections

Figures (12)

Figure 1: Description of the complete Sparse GEMINI model. Through a proximal gradient, clusters learned by GEMINI drop irrelevant features both in a skip connection and an MLP. Setting $M=0$ recovers a sparse unsupervised logistic regression.
Figure 2: Performances of Sparse GEMINI (OvO only) on synthetic datasets after 20 runs. We compare our performances against other methods. S stands for a scenario of the first synthetic dataset and D2 stands for the second synthetic dataset. Standard deviation is reported in subscript.
Figure 3: Example of convergence of the norm of the weights of the skip connection for every feature during training for the OvA Wasserstein objective. Green lines are the informative variables, black lines are the noise and red are the correlated variables. (a) In the case of noisy variables, Sparse GEMINI can recover the informative variables. (b) In the presence of redundant variables, Sparse GEMINI eliminates informative variables to keep the redundant ones.
Figure 4: Relative importance of MNIST features after training of Sparse GEMINI with a log-scale color map. The blue features were eliminated at the first steps of $\lambda$, and the red features were eliminated last. On the right: evolution of the GEMINI depending on $\lambda$. $F$ stands for the number of selected features.
Figure 5: Average training curves of Sparse GEMINI on the US Congress dataset over 50 runs. Blue lines are Wasserstein, red lines are MMD.
...and 7 more figures

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

TL;DR

Abstract

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

Authors

TL;DR

Abstract

Table of Contents

Figures (12)