Table of Contents
Fetching ...

Supervised Pattern Recognition Involving Skewed Feature Densities

Alexandre Benatti, Luciano da F. Costa

TL;DR

The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities.

Abstract

Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.

Supervised Pattern Recognition Involving Skewed Feature Densities

TL;DR

The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities.

Abstract

Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.
Paper Structure (8 sections, 16 equations, 16 figures)

This paper contains 8 sections, 16 equations, 16 figures.

Figures (16)

  • Figure 1: Two groups, shown in organge and green, are represented in terms of respective normal feature densities (symmetric around the average) on the variable $x$. The transformation of the measurement (feature) $x$ results in a new measurement $y$ characterized by skewed densities. The application of Bayesian decision considering the feature $y$ requires the estimation of these relatively more complex densities.
  • Figure 2: Examples of symmetric feature densities on a random variable $x$: (a) constant (modeled as Dirac's delta); (b) uniform; and (c) normal.
  • Figure 3: Three examples of skewed feature densities.
  • Figure 4: A possible model of two-dimensional dataset represented in terms of two features $x_1$ and $x_2$. The center of the cluster corresponds to the highest data density, which then decreases monotonically with the distance from that central point.
  • Figure 5: An example of clusters existing in a skewed feature space. Observe that the density of the clusters tends to decrease with their distance from the origin of the coordinate system $(x_1, x_2)$
  • ...and 11 more figures