Table of Contents
Fetching ...

Normalization in Proportional Feature Spaces

Alexandre Benatti, Luciano da F. Costa

TL;DR

The present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations and several concepts, properties, and results are described and discussed.

Abstract

The subject of features normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling, as it can substantially influence and be influenced by all of these activities and respective aspects. The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features, the methods to be used subsequently for the just mentioned data processing, as well as the specific questions being considered. After briefly considering how normalization constitutes one of the many interrelated parts typically involved in data analysis and modeling, the present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations. More general right skewed features are also considered in an approximated manner. Several concepts, properties, and results are described and discussed, including the description of a duality relationship between uniform and proportional feature spaces and respective comparisons, specifying conditions for consistency between comparisons in each of the two domains. Two normalization possibilities based on non-centralized dispersion of features are also presented, and also described is a modified version of the Jaccard similarity index which incorporates intrinsically normalization. Preliminary experiments are presented in order to illustrate the developed concepts and methods.

Normalization in Proportional Feature Spaces

TL;DR

The present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations and several concepts, properties, and results are described and discussed.

Abstract

The subject of features normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling, as it can substantially influence and be influenced by all of these activities and respective aspects. The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features, the methods to be used subsequently for the just mentioned data processing, as well as the specific questions being considered. After briefly considering how normalization constitutes one of the many interrelated parts typically involved in data analysis and modeling, the present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations. More general right skewed features are also considered in an approximated manner. Several concepts, properties, and results are described and discussed, including the description of a duality relationship between uniform and proportional feature spaces and respective comparisons, specifying conditions for consistency between comparisons in each of the two domains. Two normalization possibilities based on non-centralized dispersion of features are also presented, and also described is a modified version of the Jaccard similarity index which incorporates intrinsically normalization. Preliminary experiments are presented in order to illustrate the developed concepts and methods.
Paper Structure (15 sections, 54 equations, 10 figures)

This paper contains 15 sections, 54 equations, 10 figures.

Figures (10)

  • Figure 1: Diagram illustrating some of the main stages, shown as columns, typically involved in data analysis and modeling, defining the context in which normalization (emphasized near the middle of the diagram) is often approached. Also depicted are three possible approaches, represented as respective pathways. Each of these pathways can be obtained by making choices along each of the successive stages. The choice between the several possible pathways could consider, for instance, their respective effectiveness in obtaining a resulting modeling which is as stable and accurate as possible. However, each choice is influenced and influences all the other stages.
  • Figure 2: Graphical representation of the duality relationship (Eq. \ref{['eq:rel']}) between the uniform and proportional spaces of features and comparisons, established by the transformation function $y=f(x)=c^x$. In case relations between comparisons performed in the uniform domain are to be preserved, proportional comparisons should be adopted in the transformed domain in order to compensate for the effect of the proportional transformation.
  • Figure 3: Graphic illustration of two-dimensional receptive fields obtained for the multiset coincidence similarity index considering combinations of $D=2, 3$ and $E=1, 5$.
  • Figure 4: Graphic illustration of two-dimensional receptive fields obtained for the modified multiset Jaccard similarity index considering combinations of $D=2,3$.
  • Figure 5: Graphic illustration of a situation in which, though a feature $y$ is obtained by a well-defined proportional transformation (namely $y= c^x$) of a uniform features $x$, the available samples are not enough for proper identification of the type of $y$. Also included (in green), for the sake of comparison, are a long uniform density and its respective transformation (see Eq. \ref{['eq:exp']}). The densities magnitudes are now shown in absolute terms.
  • ...and 5 more figures