Flexible Bivariate Beta Mixture Model: A Probabilistic Approach for Clustering Complex Data Structures
Yung-Peng Hsu, Hung-Hsuan Chen
TL;DR
The paper addresses clustering of data with nonconvex and irregular structures by proposing the Flexible Bivariate Beta Mixture Model (FBBMM), a probabilistic mixture where each component uses a four-parameter flexible bivariate beta distribution. Parameter learning is performed via the EM algorithm, with an SLSQP optimizer for the cluster-specific shape parameters, enabling soft clustering and the modeling of both positive and negative correlations. Empirical results on synthetic nonconvex shapes and open datasets (wine, MNIST-derived features) show FBBMM outperforms traditional methods such as k-means, DBSCAN, GMM, and MBMM, highlighting its capacity to capture complex data geometries. The approach provides a practical, generative framework for clustering complex data and offers avenues for extension to higher dimensions and more robust noise handling, with an open-source implementation available.
Abstract
Clustering is essential in data analysis and machine learning, but traditional algorithms like $k$-means and Gaussian Mixture Models (GMM) often fail with nonconvex clusters. To address the challenge, we introduce the Flexible Bivariate Beta Mixture Model (FBBMM), which utilizes the flexibility of the bivariate beta distribution to handle diverse and irregular cluster shapes. Using the Expectation Maximization (EM) algorithm and Sequential Least Squares Programming (SLSQP) optimizer for parameter estimation, we validate FBBMM on synthetic and real-world datasets, demonstrating its superior performance in clustering complex data structures, offering a robust solution for big data analytics across various domains. We release the experimental code at https://github.com/yung-peng/MBMM-and-FBBMM.
