Table of Contents
Fetching ...

Incremental Affinity Propagation based on Cluster Consolidation and Stratification

Silvana Castano, Alfio Ferrara, Stefano Montanelli, Francesco Periti

TL;DR

APP addresses incremental clustering on dynamic datasets by extending AP with cluster consolidation and stratification to support faithfulness and forgetfulness in evolution. It clusters new arrivals a-posteriori by consolidating past clusters into centroids and running AP on centroids plus new data, while stratification modes manage creation, enrichment, and merges; a pruning mechanism enforces forgetfulness. Empirical results on four labeled datasets and a semantic-shift case study show APP achieving clustering quality comparable to AP and IAPNA but with substantially better scalability and memory efficiency, and with interpretable cluster evolution in diachronic text analysis. The approach demonstrates practical impact for evolutionary clustering tasks where group evolution is assumed and large embedding-based representations are used (e.g., word meaning trajectories in corpora).

Abstract

Modern data mining applications require to perform incremental clustering over dynamic datasets by tracing temporal changes over the resulting clusters. In this paper, we propose A-Posteriori affinity Propagation (APP), an incremental extension of Affinity Propagation (AP) based on cluster consolidation and cluster stratification to achieve faithfulness and forgetfulness. APP enforces incremental clustering where i) new arriving objects are dynamically consolidated into previous clusters without the need to re-execute clustering over the entire dataset of objects, and ii) a faithful sequence of clustering results is produced and maintained over time, while allowing to forget obsolete clusters with decremental learning functionalities. Four popular labeled datasets are used to test the performance of APP with respect to benchmark clustering performances obtained by conventional AP and Incremental Affinity Propagation based on Nearest neighbor Assignment (IAPNA) algorithms. Experimental results show that APP achieves comparable clustering performance while enforcing scalability at the same time.

Incremental Affinity Propagation based on Cluster Consolidation and Stratification

TL;DR

APP addresses incremental clustering on dynamic datasets by extending AP with cluster consolidation and stratification to support faithfulness and forgetfulness in evolution. It clusters new arrivals a-posteriori by consolidating past clusters into centroids and running AP on centroids plus new data, while stratification modes manage creation, enrichment, and merges; a pruning mechanism enforces forgetfulness. Empirical results on four labeled datasets and a semantic-shift case study show APP achieving clustering quality comparable to AP and IAPNA but with substantially better scalability and memory efficiency, and with interpretable cluster evolution in diachronic text analysis. The approach demonstrates practical impact for evolutionary clustering tasks where group evolution is assumed and large embedding-based representations are used (e.g., word meaning trajectories in corpora).

Abstract

Modern data mining applications require to perform incremental clustering over dynamic datasets by tracing temporal changes over the resulting clusters. In this paper, we propose A-Posteriori affinity Propagation (APP), an incremental extension of Affinity Propagation (AP) based on cluster consolidation and cluster stratification to achieve faithfulness and forgetfulness. APP enforces incremental clustering where i) new arriving objects are dynamically consolidated into previous clusters without the need to re-execute clustering over the entire dataset of objects, and ii) a faithful sequence of clustering results is produced and maintained over time, while allowing to forget obsolete clusters with decremental learning functionalities. Four popular labeled datasets are used to test the performance of APP with respect to benchmark clustering performances obtained by conventional AP and Incremental Affinity Propagation based on Nearest neighbor Assignment (IAPNA) algorithms. Experimental results show that APP achieves comparable clustering performance while enforcing scalability at the same time.
Paper Structure (30 sections, 8 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 30 sections, 8 equations, 6 figures, 13 tables, 1 algorithm.

Figures (6)

  • Figure 1: Example of AP with an incremental scenario. (A) shows the clustering result over the initial bunch of objects ($t=0$) represented by white circles. The black objects denote the cluster exemplars and dashed lines connect the objects of each cluster. (B) show the the clustering result after the second AP run ($t=1$). New incoming objects at time $t=1$ are represented by gray diamonds. Similarly to (B), the clustering result after the third AP run ($t=2$) is shown in (C). New incoming objects at time $t=2$ are represented by gray triangles.
  • Figure 2: Example of APP. (A) shows the objects available at time $t = 0$. The first clustering result coincides with AP and it is represented in (B). The black objects denote the cluster exemplars. For the sake of clarity, dashed lines fully connect the objects of each cluster. (C) shows the cluster centroids as bold circles generated by averaging the objects of each cluster on the background. (D) shows the input objects of APP at time $t=1$. Gray diamonds represent the new incoming objects. The clustering result is represented in (E). In (F), cluster centroids are unpacked and their cluster labels are associated with each object they previously packed. The second APP run at time $t=2$ is shown in (G)-(H)-(J). New incoming objects are represented by gray triangles. (J) denotes the final clustering result. Note that the cluster on the right-top corner of (I) disappears in (J) due to a pruning threshold $th_{\gamma}=1$.
  • Figure 3: Variable-incremental experiment: example of APP results by time-step over the Iris dataset.
  • Figure 4: The APP results on the Vatican corpus for the word $\textsf{novelty}$.
  • Figure 5: The evolution/stratification of clusters that are finally merged into the cluster $\textsf{k26}$ of Figure \ref{['fig:stratification']}. For the sake of readability, the cluster description is provided only for $\textsf{k3}$, $\textsf{k6}$, $\textsf{k8}$, $\textsf{k17}$, $\textsf{k20}$, $\textsf{k21}$, $\textsf{k22}$, $\textsf{k26}$.
  • ...and 1 more figures