Clustering in Dynamic Environments: A Framework for Benchmark Dataset Generation With Heterogeneous Changes
Danial Yazdani, Juergen Branke, Mohammad Sadegh Khorshidi, Mohammad Nabi Omidvar, Xiaodong Li, Amir H. Gandomi, Xin Yao
TL;DR
The paper tackles the challenge of clustering in dynamic environments by introducing DDG, a configurable benchmark generator built on multiple Dynamic Gaussian Components whose centers $\mathbf{c}_i^{(t)}$, widths $\bm{\sigma}_i^{(t)}$, rotations $\bm{\Theta}_i^{(t)}$, and weights $w_i^{(t)}$ evolve over time to produce heterogeneous local and global changes. DDG supports probabilistic, time-tick changes and provides mechanisms for both gradual and abrupt dynamics, including global shifts in the number of components, variables, and clusters, with explicit bounds and a reflection-based boundary control. The framework enables rigorous evaluation of dynamic clustering algorithms and ROOT strategies by offering rich, correlated dynamics and a principled synchronization of data generation with changing landscapes, along with offline, environment-spanning performance metrics. The work also discusses limitations of current DOAs under continuous heterogeneity and emphasizes the need for modular, scenario-specific mechanisms, while promising future presets and extensions to dynamic classification/regression tasks and open-source availability. Overall, DDG aims to close the benchmark gap in dynamic clustering by delivering realistic, controllable, and scalable dynamic datasets.
Abstract
Clustering in dynamic environments is of increasing importance, with broad applications ranging from real-time data analysis and online unsupervised learning to dynamic facility location problems. While meta-heuristics have shown promising effectiveness in static clustering tasks, their application for tracking optimal clustering solutions or robust clustering over time in dynamic environments remains largely underexplored. This is partly due to a lack of dynamic datasets with diverse, controllable, and realistic dynamic characteristics, hindering systematic performance evaluations of clustering algorithms in various dynamic scenarios. This deficiency leads to a gap in our understanding and capability to effectively design algorithms for clustering in dynamic environments. To bridge this gap, this paper introduces the Dynamic Dataset Generator (DDG). DDG features multiple dynamic Gaussian components integrated with a range of heterogeneous, local, and global changes. These changes vary in spatial and temporal severity, patterns, and domain of influence, providing a comprehensive tool for simulating a wide range of dynamic scenarios.
