Table of Contents
Fetching ...

The recursive scheme of clustering

Alicja Miniak-Górecka, Krzysztof Podlaski, Tomasz Gwizdałła

TL;DR

The paper tackles clustering of noisy climatological and experimental data where traditional methods struggle to reflect expert classifications. It introduces a recursive scheme that combines Savitzky-Golay smoothing of histograms with standard clustering algorithms ($k$-means and SOM) to automatically determine the number of clusters and refine partitions. Across ground temperature, water level, and Banknote entropy datasets, the method shows better alignment with expert assessments and reveals structure overlooked by conventional approaches. This approach offers a robust, multi-phase framework for clustering noisy measurements with practical applicability to environmental monitoring and related domains.

Abstract

The problem of data clustering is one of the most important in data analysis. It can be problematic when dealing with experimental data characterized by measurement uncertainties and errors. Our paper proposes a recursive scheme for clustering data obtained in geographical (climatological) experiments. The discussion of results obtained by k-means and SOM methods with the developed recursive procedure is presented. We show that the clustering using the new approach gives more acceptable results when compared to experts assessments.

The recursive scheme of clustering

TL;DR

The paper tackles clustering of noisy climatological and experimental data where traditional methods struggle to reflect expert classifications. It introduces a recursive scheme that combines Savitzky-Golay smoothing of histograms with standard clustering algorithms (-means and SOM) to automatically determine the number of clusters and refine partitions. Across ground temperature, water level, and Banknote entropy datasets, the method shows better alignment with expert assessments and reveals structure overlooked by conventional approaches. This approach offers a robust, multi-phase framework for clustering noisy measurements with practical applicability to environmental monitoring and related domains.

Abstract

The problem of data clustering is one of the most important in data analysis. It can be problematic when dealing with experimental data characterized by measurement uncertainties and errors. Our paper proposes a recursive scheme for clustering data obtained in geographical (climatological) experiments. The discussion of results obtained by k-means and SOM methods with the developed recursive procedure is presented. We show that the clustering using the new approach gives more acceptable results when compared to experts assessments.
Paper Structure (10 sections, 1 equation, 8 figures, 1 table, 3 algorithms)

This paper contains 10 sections, 1 equation, 8 figures, 1 table, 3 algorithms.

Figures (8)

  • Figure 1: Comparison of two different divisions of ground temperature data into five clusters using the classical k-means method and the expert's decision.
  • Figure 2: The silhouette and elbow methods applied to ground temperature data.
  • Figure 3: The results of different clustering methods applied to ground temperature data. The histogram data and the result of Savitzky-Golay (S-G) smoothing are presented at the top. Below, we marked cluster borders for different methods and different numbers of clusters.
  • Figure 4: Clustering of data for water level. The template of the picture is the same as in Fig. \ref{['tgr_sg_hist']}. The borders between the clusters are shown in the lower part of the Figure.
  • Figure 5: Clustering of data for entropy of image dataset. The borders between the clusters are shown in the lower part of the Figure.
  • ...and 3 more figures