A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm
Federico Maria Quetti, Silvia Figini, Elena ballante
TL;DR
The paper addresses the challenge of robust, interpretable clustering and objective determination of the number of clusters. It introduces the Bayesian Bagged Clustering (BBC) framework, which elicites a prior from an initial partition and applies the proper Bayesian bootstrap to generate bootstrap replicas, followed by ensemble clustering and entropy-based uncertainty measures. Key contributions include a two-step prior-informed resampling procedure, a simplex-based interpretation of cluster memberships, and practical criteria to select the optimal number of clusters $K$. Empirical results on the Iris dataset and synthetic data demonstrate enhanced stability, informative uncertainty quantification, and reliable $K$-selection under varying prior informativeness.
Abstract
The paper presents a novel approach for unsupervised techniques in the field of clustering. A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and interpretability. Our approach is organized in two steps: k-means clustering is used for prior elicitation, then proper Bayesian bootstrap is applied as resampling method in an ensemble clustering approach. Results are analyzed introducing measures of uncertainty based on Shannon entropy. The proposal provides clear indication on the optimal number of clusters, as well as a better representation of the clustered data. Empirical results are provided on simulated data showing the methodological and empirical advances obtained.
