CLAMS: A System for Zero-Shot Model Selection for Clustering
Prabhant Singh, Pieter Gijsbers, Murat Onur Yildirim, Elif Ceren Gok, Joaquin Vanschoren
TL;DR
The paper tackles zero-shot model selection for clustering by introducing CLAMS, an AutoML framework for full clustering pipelines, and CLAMS-OT, a meta-learning module that uses entropic Gromov-Wasserstein distances to quantify dataset similarity and transfer the best prior pipeline to a new unlabeled dataset. It formulates a dataset-distance based approach to select algorithms without labels and demonstrates superior performance against baselines using AMI and ROPE analyses on 57 OpenML clustering datasets. Key contributions include an open-source clustering AutoML tool, a scalable GW-LR based similarity metric, and empirical evidence that similarity-aware transfer improves clustering outcomes in unlabeled settings. The work advances AutoML for unsupervised tasks by connecting dataset geometry, meta-learning, and automated pipeline search in a unified framework with practical evaluation and reproducibility.
Abstract
We propose an AutoML system that enables model selection on clustering problems by leveraging optimal transport-based dataset similarity. Our objective is to establish a comprehensive AutoML pipeline for clustering problems and provide recommendations for selecting the most suitable algorithms, thus opening up a new area of AutoML beyond the traditional supervised learning settings. We compare our results against multiple clustering baselines and find that it outperforms all of them, hence demonstrating the utility of similarity-based automated model selection for solving clustering applications.
