Table of Contents
Fetching ...

Single-pass Possibilistic Clustering with Damped Window Footprints

Jeffrey Dale, James Keller, Aquila Galusha

TL;DR

Key contributions of SPC include the ability to model non-spherical clusters, closed-form footprint updates over arbitrarily sized damped windows, and the employment of covariance union from the multiple hypothesis tracking literature to merge two cluster mean and covariance estimates.

Abstract

Streaming clustering is a domain that has become extremely relevant in the age of big data, such as in network traffic analysis or in processing continuously-running sensor data. Furthermore, possibilistic models offer unique benefits over approaches from the literature, especially with the introduction of a "fuzzifier" parameter that controls how quickly typicality degrades as one gets further from cluster centers. We propose a single-pass possibilistic clustering (SPC) algorithm that is effective and easy to apply to new datasets. Key contributions of SPC include the ability to model non-spherical clusters, closed-form footprint updates over arbitrarily sized damped windows, and the employment of covariance union from the multiple hypothesis tracking literature to merge two cluster mean and covariance estimates. SPC is validated against five other streaming clustering algorithm on the basis of cluster purity and normalized mutual information.

Single-pass Possibilistic Clustering with Damped Window Footprints

TL;DR

Key contributions of SPC include the ability to model non-spherical clusters, closed-form footprint updates over arbitrarily sized damped windows, and the employment of covariance union from the multiple hypothesis tracking literature to merge two cluster mean and covariance estimates.

Abstract

Streaming clustering is a domain that has become extremely relevant in the age of big data, such as in network traffic analysis or in processing continuously-running sensor data. Furthermore, possibilistic models offer unique benefits over approaches from the literature, especially with the introduction of a "fuzzifier" parameter that controls how quickly typicality degrades as one gets further from cluster centers. We propose a single-pass possibilistic clustering (SPC) algorithm that is effective and easy to apply to new datasets. Key contributions of SPC include the ability to model non-spherical clusters, closed-form footprint updates over arbitrarily sized damped windows, and the employment of covariance union from the multiple hypothesis tracking literature to merge two cluster mean and covariance estimates. SPC is validated against five other streaming clustering algorithm on the basis of cluster purity and normalized mutual information.
Paper Structure (13 sections, 10 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 10 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Motivation for using a possibilistic model over a probabilistic model in SPC demonstrated on a synthetic dataset of two near-overlapping circles.
  • Figure 2: Illustration of why covariance union is needed to combined covariance matrices of two structures with unequal means.
  • Figure 3: SPC performance on the synthetic clustering dataset of Gionis et al. gionis2007clustering.
  • Figure 4: Nonstationary synthetic dataset of three sine waves illustrating the utility of SPC's forgetting factor in modeling newer data with finer detail. Old points are still modeled, but with less granularity. Here, SPC was run with a high forgetting factor of $\gamma=0.1$.
  • Figure 5: SPC performance on an overlapping cluster dataset.
  • ...and 1 more figures