Incremental Affinity Propagation based on Cluster Consolidation and Stratification
Silvana Castano, Alfio Ferrara, Stefano Montanelli, Francesco Periti
TL;DR
APP addresses incremental clustering on dynamic datasets by extending AP with cluster consolidation and stratification to support faithfulness and forgetfulness in evolution. It clusters new arrivals a-posteriori by consolidating past clusters into centroids and running AP on centroids plus new data, while stratification modes manage creation, enrichment, and merges; a pruning mechanism enforces forgetfulness. Empirical results on four labeled datasets and a semantic-shift case study show APP achieving clustering quality comparable to AP and IAPNA but with substantially better scalability and memory efficiency, and with interpretable cluster evolution in diachronic text analysis. The approach demonstrates practical impact for evolutionary clustering tasks where group evolution is assumed and large embedding-based representations are used (e.g., word meaning trajectories in corpora).
Abstract
Modern data mining applications require to perform incremental clustering over dynamic datasets by tracing temporal changes over the resulting clusters. In this paper, we propose A-Posteriori affinity Propagation (APP), an incremental extension of Affinity Propagation (AP) based on cluster consolidation and cluster stratification to achieve faithfulness and forgetfulness. APP enforces incremental clustering where i) new arriving objects are dynamically consolidated into previous clusters without the need to re-execute clustering over the entire dataset of objects, and ii) a faithful sequence of clustering results is produced and maintained over time, while allowing to forget obsolete clusters with decremental learning functionalities. Four popular labeled datasets are used to test the performance of APP with respect to benchmark clustering performances obtained by conventional AP and Incremental Affinity Propagation based on Nearest neighbor Assignment (IAPNA) algorithms. Experimental results show that APP achieves comparable clustering performance while enforcing scalability at the same time.
