Table of Contents
Fetching ...

Fuzzy clustering of circular time series based on a new dependence measure with applications to wind data

Ángel López-Oriona, Ying Sun, Rosa M. Crujeiras

TL;DR

The problem of clustering circular time series is addressed, and a distance between circular series is introduced and used to construct a clustering procedure, which relies on a new measure of serial dependence considering circular arcs, thus taking advantage of the directional character inherent to the series range.

Abstract

Time series clustering is an essential machine learning task with applications in many disciplines. While the majority of the methods focus on time series taking values on the real line, very few works consider time series defined on the unit circle, although the latter objects frequently arise in many applications. In this paper, the problem of clustering circular time series is addressed. To this aim, a distance between circular series is introduced and used to construct a clustering procedure. The metric relies on a new measure of serial dependence considering circular arcs, thus taking advantage of the directional character inherent to the series range. Since the dynamics of the series may vary over the time, we adopt a fuzzy approach, which enables the procedure to locate each series into several clusters with different membership degrees. The resulting clustering algorithm is able to group series generated from similar stochastic processes, reaching accurate results with series coming from a broad variety of models. An extensive simulation study shows that the proposed method outperforms several alternative techniques, besides being computationally efficient. Two interesting applications involving time series of wind direction in Saudi Arabia highlight the potential of the proposed approach.

Fuzzy clustering of circular time series based on a new dependence measure with applications to wind data

TL;DR

The problem of clustering circular time series is addressed, and a distance between circular series is introduced and used to construct a clustering procedure, which relies on a new measure of serial dependence considering circular arcs, thus taking advantage of the directional character inherent to the series range.

Abstract

Time series clustering is an essential machine learning task with applications in many disciplines. While the majority of the methods focus on time series taking values on the real line, very few works consider time series defined on the unit circle, although the latter objects frequently arise in many applications. In this paper, the problem of clustering circular time series is addressed. To this aim, a distance between circular series is introduced and used to construct a clustering procedure. The metric relies on a new measure of serial dependence considering circular arcs, thus taking advantage of the directional character inherent to the series range. Since the dynamics of the series may vary over the time, we adopt a fuzzy approach, which enables the procedure to locate each series into several clusters with different membership degrees. The resulting clustering algorithm is able to group series generated from similar stochastic processes, reaching accurate results with series coming from a broad variety of models. An extensive simulation study shows that the proposed method outperforms several alternative techniques, besides being computationally efficient. Two interesting applications involving time series of wind direction in Saudi Arabia highlight the potential of the proposed approach.
Paper Structure (16 sections, 26 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 26 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Mean of 1000 independent replicates of $\widehat{d}_{CQA}$, scaled by a factor of 100 (blue curve), as a function of the radius $r$. The lower (upper) dashed line represents the 5th (95th) quantile of the 1000 replicates.
  • Figure 2: Two-dimensional scaling planes based on distance $\widehat{d}_{CQA}$ for Scenarios 1 ($T=500$), 2 ($T=500$) and 3 ($T=1000$) and two circular transformations.
  • Figure 3: Rates of correct classification as function of $m$ obtained by the fuzzy $C$-medoids clustering algorithm based on several dissimilarities with a cutoff of 0.7. Scenarios 4 ($T=500$), 5 ($T=500$) and 6 ($T=1000$).
  • Figure 4: Two-dimensional scaling plane based on distance $\widehat{d}_{CQA}$ for the 64 time series of wind direction in the city of Abha.
  • Figure 5: CQA-based estimates for the medoid time series in the first case study for $l=1$ and $r=0.7$.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5