A Unifying Family of Data-Adaptive Partitioning Algorithms
Guy B. Oldaker, Maria Emelianenko
TL;DR
This work introduces a unifying, data-adaptive family of partitioning algorithms parameterized by $\alpha \in [0,1]$ that encompasses and extends classic clustering methods such as $k$-means and $k$-subspaces. Through a single objective $\mathcal{G}_{\alpha}$ and alternating minimization, the approach jointly optimizes Voronoi sets, orthogonal projectors, and centroids, with an adaptive mechanism that can adjust the number of clusters $k$ and total dimension $r$ based on data structure. The authors demonstrate versatility across subspace clustering, model order reduction, and matrix approximation, achieving automatic structure discovery and competitive or improved performance relative to established methods. The work suggests broad potential for cross-domain integration and motivates further exploration of parameter tuning, ensemble strategies, and connections to existing convergence theories. Overall, the framework offers a scalable, interpretable toolkit for high-dimensional data analysis with automatic adaptation capabilities.
Abstract
Clustering algorithms remain valuable tools for grouping and summarizing the most important aspects of data. Example areas where this is the case include image segmentation, dimension reduction, signals analysis, model order reduction, numerical analysis, and others. As a consequence, many clustering approaches have been developed to satisfy the unique needs of each particular field. In this article, we present a family of data-adaptive partitioning algorithms that unifies several well-known methods (e.g., k-means and k-subspaces). Indexed by a single parameter and employing a common minimization strategy, the algorithms are easy to use and interpret, and scale well to large, high-dimensional problems. In addition, we develop an adaptive mechanism that (a) exhibits skill at automatically uncovering data structures and problem parameters without any expert knowledge and, (b) can be used to augment other existing methods. By demonstrating the performance of our methods on examples from disparate fields including subspace clustering, model order reduction, and matrix approximation, we hope to highlight their versatility and potential for extending the boundaries of existing scientific domains. We believe our family's parametrized structure represents a synergism of algorithms that will foster new developments and directions, not least within the data science community.
